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1 Introduction 


Nonsmooth optimization addresses to solve the optimization problem 
min/(x) 

s.t. Fi{x) <0 for all * = 1,..., m , 


where f,Fi : R" —>■ R are locally Lipschitz continuous. Since Fi{x) < 0 for 
alH = 1,..., m if and only if F[x) := maxi=i_...^m CiFiix) < 0 with constants 
Ci > 0 and since F is still locally Lipschitz continuous (cf., e.g., Mifflin 
p. 969, Theorem 6 (a)], we can always assume m = 1 in ([I}. Since we do not 
take scaling problems of the constraints into account in this paper, we choose 
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Ci = 1 for all i = 1,..., m and therefore we always consider the nonsmooth 
optimization problem with a single nonsmooth constraint 

min f(x) 

( 2 ) 

s.t. F(x) < 0 , 

where F : M" —>■ R is locally Lipschitz continuous, instead of ©■ 

Since locally Lipschitz continuous functions are differentiable almost ev¬ 
erywhere, both / and F may have kinks and therefore already the attempt to 
solve an unconstrained nonsmooth optimization problem by a smooth solver 
(e.g., by a line search algorithm or by a trust region method) by just replacing 
the gradient by a subgradient, fails in general (cf., e.g., ZoWE [731 . p. 461-462]): 
If g is an element of the subdifferential df{x), then the search direction —g 
does not need to be a direction of descent (contrary to the behavior of the 
gradient of a differentiable function). Furthermore, it can happen that {xk} 
converges towards a minimizer x, although the sequence of gradients {V f{xk)} 
does not converge towards 0 and therefore we cannot identify i as a minimizer. 
Moreover, it can happen that {ifc} converges towards a point x, but x is not 
stationary for /. The reason for these problems is that if / is not differentiable 
at X, then the gradient V/ is discontinuous at x and therefore Vf{x) does not 
give any information about the behavior of V/ in a neighborhood of x. 

Not surprisingly, like in smooth optimization, the presence of constraints 
adds additional complexity, since constructing a descent sequence whose limit 
satisfies the constraints is (both theoretically and numerically) much more 
difficult than achieving this aim without the requirement of satisfying any 
restrictions. 

Methods that are able to solve nonsmooth optimization problems are, e.g., 
bundle algorithms which force a descent of the objective function by using 
local knowledge of the function, the R-algorithm by Shor or stochastic 
algorithms that try to approximate the subdifferential. In the following we will 
present a few implementations of these methods. 

Bundle algorithms. Bundle algorithms are iterative methods for solving non¬ 
smooth optimization problems. They only need to compute one element g of 
the subdifferential df{x) per iteration, which in practice is easily computable 
by algorithmic differentiation (cf., e.g., Griewank & CORLiSS [13 )■ For com¬ 
puting the search direction, they collect information about the function (e.g., 
subgradients) from previous iterations. This collected information is referred 
to as “the bundle”. 

As in smooth optimization, convex nonsmooth optimization is much eas¬ 
ier than nonconvex nonsmooth optimization as well in theory as in practice 
because convex functions only have global minimizers and the cutting plane 
approximation of a convex function always yields an underestimation which in 
particular simplifies convergence analysis. A good introduction to nonsmooth 
optimization which treats the convex, unconstrained case in great detail is 
Bonnans et al. p. 106 ff]. Moreover, very detailed standard references for 
nonsmooth nonconvex optimization are KiwiEL and Makela & Neit- 
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TAANMAKI [H^, which both in particular discuss constrained problems exten¬ 
sively. 

Now we give a brief overview over a few bundle algorithms. We start 
this overview with the following bundle algorithms that support nonconvex 
constraints: The multiobjective proximal bundle method for nonconvex non¬ 
smooth optimization (MPBNGC) by Makela is a first order method 
that uses the improvement function hx^{x) := max (/(a;) — f{xk),F{x)) for 
the handling of the constraints. Further details about the proximal bundle 
method can be found in Makela & Neittaanmaki [^. The algorithms in 
Mifflin (ssL [HJ, [H^ support a nonconvex objective function as well as non¬ 
convex constraints (cf. Remark [3]). NOA by Kiwiel & Stachurski is 
a nonsmooth optimization algorithm that handles nonconvex constraints by 
using a penalty function or an improvement function, while in the special case 
of convex constraints it offers an alternative treatment by the constraint lin¬ 
earization technique by Kiwiel (s^ . The limited memory bundle algorithm 
for inequality constrained nondifferentiable optimization by Karmitsa et al. 

combines LMBM by Haarala with the feasible directions interior 
point technique by Herskovits [Hi , Herskovits & Santos (Hi for dealing 
with the constraints. The search direction is determined by solving a linear 
system. 

In addition a few bundle algorithms can only handle convex constraints: 
The bundle trust algorithm by Schramm [H|) Schramm & ZoWE [H|i which 
also supports a nonconvex objective function, handles the constraints by using 
the constraint linearization technique by Kiwiel [H|- Th® bundle hlter algo¬ 
rithm by Fletcher & Leyffer [Hi is only applicable to convex optimization 
problems and it computes the search direction by solving a linear program. 
The bundle-filter method for nonsmooth convex constrained optimization by 
Karas et al. [Hi is based on the improvement function. The infeasible bundle 
method for nonsmooth convex constrained optimization by Sagastizabal & 
SOLODOV [Hi is also based on the improvement function, but it uses neither 
a penalty function nor a filter. 

Moreover, there are some bundle algorithms that support at most linear 
constraints: The variable metric bundle method PVAR by LuKSAN & Vlcek 
[Hi, Vlcek & Luksan [Hi can solve nonsmooth linearly constrained prob¬ 


lems with a nonconvex objective function. The implementation PBUN of the 
proximal bundle method by LuKSAN & Vlcek [HilHli Vlcek [nl optimizes 


a nonconvex objective function, where the feasible set is given by linear con¬ 
straints. The proximal bundle method by Kiwiel (HIi which is based on a 
restricted step concept, can handle a nonconvex objective function and linear 
constraints. The focus of the limited memory bundle method LMBM by Haar¬ 
ala [Hi) Haarala et al. 21, Hi is the treatment of large-scale nonsmooth 
nonconvex unconstrained optimization problems. This is done by combining 
ideas from the variable metric bundle method Luksan & Vlcek [H|) Vlcek 
& Luksan [Hi and limited memory variable metric methods by, e.g, Byrd 
et al. Hi- Ifs bound constraint version is presented in Karmitsa & Makela 

iHi 
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All algorithms mentioned above only use first order information of the ob¬ 
jective function and the constraints as input. Nevertheless, there are some very 
interesting bundle methods, since they are Newton-like methods (at least in 
some sense) and which only support the handling of linear constraints yet as 
far as I know (except for putting the objective function and the constraints 
into a penalty function with a fixed penalty parameter and then applying the 
nnconstrained algorithm to the penalty function): The quasi-Newton bundle- 
type method for nondifferentiable convex optimization by Mifflin et al. 
generalizes the idea of Quasi-Newton methods to nonsmooth optimization and 
it converges superlinearly for strongly convex functions (and some additional 
technical assumptions). The bundle-Newton method for nonsmooth uncon¬ 
strained minimization by LuKSAN & Vlcek snpports a nonconvex ob¬ 
jective function, it is based on an SQP-approach, and it is the only method 
for solving nonsmooth optimization problems that I know which uses Hessian 
information. Furthermore, its rate of convergence is superlinear for strongly 
convex, twice times continuously differentiable functions. Moreover, a descrip¬ 
tion of the implementation PNEW of the bundle-Newton method can be found 
in Luksan & Vlcek [i^i. 

In this paper we extend the bundle-Newton method to a second order 
bundle algorithm for nonsmooth, nonconvex inequality constraints by using 
additional quadratic information: We use second order information of the con¬ 
straint (cf. (ID)). Furthermore, we use the SQP-approach of the bundle-Newton 
method for computing the search direction for the constrained case and com¬ 
bine it with the idea of quadratic constraint approximation, as it is used, e.g., 
in the sequential quadratically constrained quadratic programming method by 
SOLODOV (this method is not a bundle method), in the hope to obtain 
good feasible iterates, where we only accept strictly feasible points as serious 
steps. Therefore, we have to solve a strictly feasible convex QCQP for compnt- 
ing the search direction (Note that this approach also yields a generalization 
of the original bundle-Newton method in the unconstrained case). Using such 
a QCQP for computing the search direction yields a line search condition for 
accepting infeasible points as trial points (which is different to that in, e.g., 
Mifflin IH). One of the most important properties of the convex QP (that 
is used to determine the search direction) with respect to a bundle method is 
its strong duality (e.g., for a meaningful termination criterion, for global con¬ 
vergence,. ..) which is also true in the case of strictly feasible convex QCQPs 
(cf. Subsection 14.21) . 

For Numerical results we refer the reader to Fendl & SCHiCHL 0 . Proofs 
that are presented in this paper can be looked up in explicit detail in Fendl 
0 p. 25 ff. Chapter 3]. 

Other algorithms for nonsmooth optimization. There exist several other 
methods for solving nonsmooth optimization problems that are not based on 
the bundle approach or that are no bundle algorithms in the sense as de¬ 
scribed on page [D A few representatives of these methods that support at 
most linear constraints are: The algorithm PMIN by Luksan & Vlcek (43 |. 
which is based on Luksan [0, solves linearly constrained minimax optimiza- 
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tion problems, i.e. the objective function must be maximum of twice times 
continuously differentiable functions. The robust gradient sampling algorithm 
for nonsmooth nonconvex optimization by BuRKE et al. |ir)l| approximates the 
whole subdifferential at each iteration (cf. Burke et ah [3) and does not make 
null steps. The MATLAB-code HANSO by Overton combines ideas from 
BFGS algorithms (cf. Lewis Overton 0 ) and from the gradient sampling 
algorithm by Burke et al. 0 for solving nonsmooth unconstrained optimiza¬ 
tion problems. The derivative-free bundle method (DFBM) by Bagirov [i|, 
where “derivate-free” means that no derivate information is used explicitly, 
can solve linearly constrained nonsmooth problems. The subgradients are ap¬ 
proximated by finite differences in this algorithm (cf. Bagirov i). DFBM is 
an essential part of the programming library for global and non-smooth opti¬ 
mization GANSO by Bagirov et al. 0. The discrete gradient method DGM 
for nonsmooth nonconvex unconstrained optimization by Bagirov et al. is 
a bundle-like method that does not compute subgradients, but approximates 
them by discrete gradients. The quasisecant method QSM for minimizing non¬ 
smooth nonconvex functions by Bagirov & Ganjehlou j3| combines ideas 
both from bundle methods and from the gradient sampling method by Burke 
et al. [inl |. 

Furthermore, we want to mention the following solver for nonsmooth con¬ 
vex optimization problems: The oracle based optimization engine OBOE by 
Vial & Sawhney is based on the analytic center cutting plane method 
by Nesterov & Vial [^, which is an interior point framework. 

Finally, we list a few algorithms that can also handle nonconvex constraints: 
The robust sequential quadratic programming alg orithm extends the gradient 
sampling algorithm by CURTIS & Overton [l^ for nonconvex, nonsmooth 
constrained optimization. SolvOpt by Kappel &: Kuntsevigh is an im¬ 
plementation of the R-algorithm by Shor [g^I- It handles the constraints by 
automatically adapting the penalty parameter, ral^by Kroshko is an¬ 
other implementation of the R-algorithm by Shor that is only available in 
(the interpreted programming language) Python. The constraints are handled 
by a filter technique. 

Remark 1 Karmitsa et al. gives a brief, excellent description of the main 
ideas (including very good readable pseudo code) of many of the unconstrained 
methods resp. the unconstrained versions of the methods which we mentioned 
above (for further information visit the online decision tree for nonsmooth 
optimization software by Karmitsa 0). 

The paper is organized as follows: In Section [5] we recall the basics of an 
SQP-method which is a common technique in smooth optimization and we 
summarize the most important facts about nonsmooth optimization theory. 
In Section |3] we give the theoretical foundation of our second order bundle 
algorithm and afterwards we present the algorithm and the line search in 
detail. Finally, we show the convergence of the line search and the global 
convergence of the algorithm in Section HI 
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Throughout the paper we use the following notation: We denote the non¬ 
negative real numbers by R>o := {a; G K : a; > 0}. We denote the space of 
all symmetric n x n-matrices by For x G M" we denote the Euclidean 

norm of x by \x\, and for A G Sym(n) we denote the spectral norm of A by |A|. 
Furthermore, we denote the smallest resp. the largest eigenvalue of a positive 
definite matrix A G by Aniin(^) resp. Amax(^)- Therefore, if A is positive 
definite, we have 

|Al| = VA„,ax(Al) (3) 

(cf., e.g., Golub & van Loan (l^ p. 394, Follow up of Theorem 8.1.2]). 


2 Optimization theory 

In the following section we summarize the basics of an SQP-method, since 
we will approximate a nonsmooth problem by a sequence of smooth prob¬ 
lems to derive our algorithm in Section |3] and hence we will need some facts 
about smooth optimization, and we present the most important facts about 
nonsmooth optimization theory. 


2.1 Smooth optimality conditions & SQP 


Theorem 1 Let /, Fi : R" —R (with i = 1,... ,m) be continuously differ¬ 
entiable and X G R" be a solution of the smooth optimization problem 


mm/(x) 

s.t.Fi{x)<0 for all i = 1,..., m . 


( 4 ) 


Then there exist n > 0 and A > 0 with 


nV/(x)^ + ^VF,(x)^A, = 0 , 

(5) 

XiFi{x) = 0 for alH = 1,..., m , 

K = 1 or (k = 0, A 7 ^ 0) . 

If all occurring functions are convex, then the existence of a strictly feasible x 
(i.e. F{x) <0^ always guarantees k = 1, and the conditions ([5]) are sufficient 
(for a feasible x being a minimizer of ([¥)l ). 

Proof. Combine, e.g., SCHiCHL & Neumaier (^ . p. 19, 4.1 Theorem] and 
Boyd & Vandenberghe [^, p. 243, 5.5.3 KKT optimality conditions]. □ 

One possibility to find a solution of the optimization problem ([3]) is using an 
SQP-method (sequential quadratic programming). An SQP-method minimizes 
the quadratic approximation of the Lagrangian L : R” x R>q —>■ R given by 
L{x, A) := f{x) -lYliLi Fi{x)\i subject to linearizations of the constraints and 
then it uses the obtained minimizer as the new iteration point (or it performs a 
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line search between the current iteration point and the obtained minimizer to 
determine the new iteration point). Since quadratic information is necessary 
for this approach, we demand /, Fi : R" —>■ M. (with t = 1,..., m) to be 
in this subsection. 


Proposition 1 Let the matrix VF{x) G (gradient of the constraints) 

have full rank (“Constraint qualification”) and let the Hessian of the Lagrangian 
with respect to the x-components 'V^^L{x, X) = V^/(x) + ^^Fi{x)Xi be 
positive definite on the tangent space of the constraints, i.e. X)d > 0 

for all d G M" with d Q and S/F{x)d = 0 (cf. Nocedal & Wright (^ . 
p. 531, Assumption 18.1]J. Then the SQP-step for optimization problem Q 
is given by the solution of the QP 


f{x) +mmyf{x)d+ ^d^V^ 2 ,L(a:, A)d 

s.t. Fi{x) + VFi{x)d <0 for all* = 1,..., m . 


( 6 ) 


Proof. Straightforward calculations. 


□ 


Remark 2 A difficulty of an infeasible SQP-method (e.g., SNOPT by Gill 
et al. [13 ) — i.e. infeasible iteration points Xk may occur — is that the linear 
constraints of the QP (HI) can be infeasible (cf., e.g, Nocedal & Wright 
[ 59 L p. 535, 18.3 Algorithmic development]). Note that this difficulty does not 
arise for a feasible SQP-method (e.g., FSQP by Lawrence & Tits 0) - 
i.e. only feasible iteration points Xk are accepted — as then d = 0 is always 
feasible for the QP (HI). Nevertheless, in this case it can be difficult to obtain 
feasible points that make good progress towards a solution (cf. Remark H]). 


2.2 Nonsmooth Optimality conditions 


We gather information on the optimality conditions of the nonsmooth opti¬ 
mization problem (HD with locally Lipschitz continuous functions /, Fi : R" — > 
R for i = 1,..., m. For this purpose, we closely follow the exposition in BOR- 

wein & Lewis [ 3 - 

Definition 1 Let U C R" be open and / : R" — R. We define the Clarke 
directional derivative in x G U in direction d G R” by 


f'^{x,d) := limsup 

h — 


f{x + h + td) — f{x + h) 
t 


and we define the subdifferential df{x) C R" of / in a; £ 17 by 
df{x) := ch {5 G R" : g^d < f°{x, d) for all d G R"} , 


where ch denotes the convex hull of a set. The elements of df{x) are called 
subgradients. We define the set d‘^f{x) C R"y^ of the substitutes for the 
Hessian of / at a; by 


d^f{x) 


{G} 


sn X n 
^sym 


if the Hessian G of f ad x exists 
else . 


(7) 
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We summarize the most important properties of the Clarke directional 
derivative and the subdifferential. The following two results are taken from 
Borwein & Lewis [3|- 

Proposition 2 The suhdifferential df{x) is non-empty, convex and compact. 
Furthermore, df : K" —> V{W^), where P(R”) denotes the power set o/K”, 
is locally bounded and upper semicontinuous. 

Theorem 2 (First order nonsmooth optimality conditions) Let x be a 

local minimizer of o and f, Fi : R" —> K (with i = 1,... ,m) be Lipschitz 
continuous in a neighborhood of x. Then there exists k > 0 and A > 0 with 


m 

0 e Kdf{x) + ^ XidFi{x) , 

i=l 

XiFi{x) = 0 for alH = 1,. .., m , 

K = 1 or (k = 0, A 0) . 

Furthermore, if there exists a direction d G R" that satisfies the (nonsmooth) 
constraint qualification 

F°{x,d) < 0 for all j € {1,... ,m} with Fj{x) = 0 , (8) 

then we can always set k = 1. 

Corollary 1 Let the constraint qualification ([5]) be satisfied for m, then the 
optimality condition for m reads as follows: There exists A > 0 with 

0 € df{x) + XdF{x) , XF{x) = 0 , F{x) < 0 . (9) 

Proof. Inserting into Theorem [5] with m = 1. □ 

Remark 3 The algorithms in Mifflin (5^.[53.[55| (for solving nonlinearly con¬ 
strained nonsmooth optimization problems) use a fixed point theorem about 
certain upper semicontinuous point to set mappings by Merrill as opti¬ 
mality condition which is different to an approach with the optimality condi¬ 
tions in Theorem!^ or Corollary [TJ 


3 Derivation of the method 

In this section we discuss the theoretical basics of our second order bundle 
algorithm and we give a detailed presentation of the algorithm and the line 
search. 
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3.1 Theoretical basics 

We assume in this section that the functions /, F : K." —K. are locally 
Lipschitz continuous, gj £ df{yj), Qj £ dF(yj)^ and let Gj and Gj be approx¬ 
imations of elements in d‘^f{yj) and d^F{yj) (cf. (O), respectively. 

Our goal is to determine a local minimizer for the nonsmooth optimization 
problem Q 


min 

xGR" 


fix) 


s.t. F{x) < 0 , 


and therefore we want to find a point that satisfies the first order optimality 
conditions (jH]). To attain this goal, we will propose an extension to the bundle- 
Newton method for nonsmooth unconstrained minimization by LuKSAN & 
Vlcek ( 4 ^: If we are in the optimization problem ([5]) at the iteration point 
Xk £ R” (with iteration index k), we want to compute the next trial point 
(i.e. the search direction) by approximating both the objective function / and 
the constraint F at Xk by a piecewise quadratic function and then perform 
a single SQP-step, as defined in Proposition [1] to the resulting optimization 
problem. 

Definition 2 Let Jk C {1,..., fc}. We define a quadratic approximation of / 
resp. F in yj £ M" with damping parameter pj resp. pj £ [0,1] for j £ Jk by 

fjix) ■= fivj) + gJ(x - yj) + ^Pjix - yj)^Gj{x - y^) 

F^jix) ■= F{yj) + gJ{x - yj) + ^jix - yj)'^Gj{x - y,) 

and the corresponding gradients by 

g]{x) := V/*(a;)^ = gj+pjGj{x-yj) , gj{x) := VFj{x)^ = gj+pjGj{x-yj) . 

(11) 

We define the piecewise quadratic approximation of / resp. T’ in £ K." by 
fkix) ■■= rnax/#(a:) , := inaxF*(a;) . (12) 


Hence we approximate the objective function / at Xk by and the con¬ 
straint F at Xk by F^ in the optimization problem ([2]) and then we perform 
a single SQP-step to the resulting optimization problem 


min/°(a;) 


s.t. fI^{x) < 0 . 


(13) 


It is important to observe here that the local model for the nonsmooth problem 
(P)) is the piecewise quadratic nonsmooth problem (I13II . This problem in turn 
can, however, be equivalently written as a smooth QCQP. 
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Proposition 3 The SQP-step {d,v) G for (fTITl) is given by the solution 

of the QP 

f{xk) + mini) + \(fw^d 

d,v 

s.t. - {f{xk) - fj) + d'^gj < V foralljeJfc (14) 

F{xk) - {F{xk) - F/) + d^g’f < 0 for all j G Jk , 

where 

fj ■■= f-{xk) , g’f ■■= g^ixk) gj + PjGj{xk - yj) 

Ff := F^{xk) , := g^ixk) gj + PjGjixk - Vj) (15) 

w^-.= Y. E 

j^Jk-1 3^Jk-l 

and resp. denote the Lagrange multipliers with respect to f resp. F 

at iteration k — 1 for j G Jk-i ■ 

Proof. We rewrite (fT51) as a smooth optimization problem by using (fl^ . If we 
are at the iteration point (xk,Uk) G K" x K. with Uk '■= f{xk) hi this smooth 
reformulation, then, according to ([HI) as well as using (TTKIl . the SQP-step for 
this problem is given by the solution of the QP (I14II . □ 

Since /j resp. F^ are only global underestimators for convex / resp. F and 
Pj = pj = 0 and since resp. F^ approximate / resp. F only well for trial 
points close to Xk, we decrease the activity of non local information (e.g., non 
local subgradients) by the following definition. 

Definition 3 We define the localized approximation errors of / resp. F by 

a’^ := max (|/(xfc) - fjlnis^T^) , := max (|F(a;fc) - Ff 1 , 72 ( 5 ^)'^") , 

(16) 

where 

k-l 

Sj ■■=\y3-XJ\ + Y\^^+l-Xi\ (17) 

i=3 

denotes a locality measure for j = 1 ,..., fc with fixed parameters 7 ^ > 0 and 
Wi > 1 for i = 1 , 2 . 

Proposition 4 The locality measure Sj has the following properties 

sY \xk+i - Xk\ = sY ^ s’^j>\yj-Xk\ for all j = 1,..., fc . (18) 

Proof. Straightforward calculations. □ 

Like the bundle-Newton method by LuKSAN & Vlcek [i^, our algorithm 
uses a convex search direction problem and therefore we modify (I14|) in the 
following sense. 
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Proposition 5 If we generalize m by using the localized approximation er¬ 
rors (O and replacing by a positive definite modification Wp (e.g., the 
Gill-Murray factorization by GiLL & MURRAY M), then the generalized ver¬ 
sion of m reads 

f{xk) + mini) + \d^Wpd 

d,v ^ ^ 

s.t. — + (Tg^ < V for all j € Jk (1®) 

F{xk) - < 0 for all j € Jk ■ 

Proof. Replace f(xi.)— fj bv a^. F(xh) — Ff bv and by Wj: in (fHl) . □ 

Remark 4 The standard SQP approach for smooth optimization problems suf¬ 
fers from the Maratos effect Maratos [^, which, in general, prevents infea¬ 
sible SQP-methods from getting a descent in the merit function and feasible 
SQP-methods from finding (good) feasible points (cf. Tits p. 1003] and 
Example [T]) . Some well known techniques for avoiding the Maratos effect are 
replacing the merit function by an augmented Lagrangian, using second order 
corrections, using a watchdog tec hniq ue (which is a non-monotone line search) 
(cf., e.g., Nocedal & Wright |59l . p. 440, 15.5 The Maratos effect]), or a 
quadratic approximation of the constraints (cf. SOLODOV 0). We will choose 
the quadratic constrained approximation approach to avoid the Maratos effect, 
which makes the search direction problem slightly more difficult to solve than 
a QP, but, as we will see, still guarantees strong duality which is necessary for 
proving convergence of our bundle method. 

Example 1 Consider the optimization problem ([2]) with f,F : — > R, where 
f{x) := X 2 and F(x) := x\ — X 2 . Then this problem has the (global) minimizer 
X = 0. Furthermore, it is smooth and consequently its SQP-direction, which 
is obtained by solving the QP (jS]), at the iteration fc = 0 at the iteration point 
(xfc, Afe) := (—1,1 -I- 10“®, 1), which implies that Xk is close to the boundary, 
is given by dk = (1,-2). Since we have for t G [0,1] that F(xk + tdk) < 0 
if and only if t < 10“^, a feasible SQP-method can only make a tiny step 
towards the solution x on the standard SQP-direction in this example, and 
similar observations can be made for any other point Xk with fc ^ 0 that is 
close to the boundary (Note that the objective function / has no impact on 
the Hessian of the Lagrangian in the QP ((B]) in this example). 

Remark m leads to the following idea: Let G'-,Gj G be positive defi¬ 

nite (e.g., positive definite modifications of Gj G d‘^f{yj) resp. Gj G d‘^F{yj)-, 
also cf. Remark [To]). Then we can try to determine the search direction by 
solving the convex QCQP 

/(xfc) -I- mini) -|- \d^Wpd 

d,v ^ 

s.t. — aj d'^gj -f ^d^Gjd < v for all j G Jk (^0) 

F{xk) - A) + d^g^ + \d^G]d < 0 for all j G Jk 
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instead of the QP (ED, i.e. instead of just demanding that the first order 
approximations are feasible, we demand that the first order approximations 
must be the more feasible, the more we move away from Xk- 

Example 2 We consider the optimization problem ([ 2 D with f{x) := X2, and 
F{x) := max(min(Fi(a;),F2(a;)),F3(x)), where Fi{x) := xf + X2, ^2(2;) := 
—xi + and F^^x) := xi — 2 , and we assume that we are at the iteration 
point Xk ■= 0 . 

Since F{x) := max (^ 2 ( 2 :), ^ 3 ( 2 :)) is convex, and since an easy examination 
yields that F{x) < 0 F{x) < 0, the feasible set of our optimization prob¬ 
lem ([2D is convex. Therefore, the linearity of / implies that our optimization 
problem has the unique minimizer x := (2, —^/2). 

The quadratic approximation of F with respect to Xk in the QCQP (I^Dl) 
reads Fi{xk -b d) < 0, i.e. d = 0 is the only feasible point for the QCQP (I^IB 
and therefore its solution, although Xfc = 0 is not a stationary point for our 
optimization problem (for this consider /), resp. much less a minimizer (since 
X is the unique minimizer of our optimization problem). As it can be seen, e.g., 
from considering the restriction of f to a ;2 = 0, the reason for the occurrence 
of d = 0 at Xfc is the nonconvexity of F (which is a result of the presence of 
the min-function in F), although the feasible set is convex. 

Notice that if we substitute A by A in the constraint of our optimization 
problem, which yields the same feasible set, the difficulty which we described 
above does not occur. 


Remark 5 If F{xk) < 0, ((ED well as) ([^Dl) is always feasible and therefore 
we do not have to deal with infeasible search direction problems as they occur 
in infeasible SQP-methods (cf. Remark [2D- Nevertheless, we have to demand 
Fixk) < 0, since otherwise it can happen that dfc = 0 is the only feasible point 
and therefore the solution of (l20ll , but Xk is not stationary for ([2D as Example 
[21 showed. This is similar to difficulties arising in smooth problems at saddle 
points of the constraints. 


Now we state the dual search direction problem which plays an important 
role for proving the global convergence of the method (cf. Subsection 14.211 . 

Proposition 6 The dual problem of the QCQP ([^Dl) is given by 


f{xk) - min i Hk{\ m) ( XI + Tjgf) + X + ^ 2 ^. 




ieJfc 


- ( X Tj)F{xk) (21) 


j&Jk 


3.t. Xj > 0, p,j > 0 for all j G Jk, X ~ 

j&Jk 


where Hk{X,n) := {Wj; If F{xk) < 0, then the 

duality gap is zero, and, furthermore, if we denote the minimizer of the dual 
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problem ( (1211) i by then the minimizer {dk,Vk) of the primal QCQP 

(EOl) satisfies 


* = \‘G}' + E + ffs?) 

jeJk j&Jk 

n = (E - E + i‘‘i(E ■'jG})* 

jeJk jeJk jeJk 

= -dlW^dk -^dl{J 2 m + ^z]d))dk - Y. 

j^Jk jeJk 

jeJk 


Proof. The Lagrangian of (EUll is given by L{d,v, X, p.) := v + ^d^W^d + 
SiGJfc ^ 3 Pl(d, i))+Ej6jfc Aij-Ff (d, t)), where 5) := -a^+d^g^^+irf^G^'d- 

t) and Fj{d,v) := F{xk) — + dfig^ + \d^G^d. Consequently, the equality 

constraint of the dual problem reads 

(w^ + E d + E +hI'; = 0, E = 1 • (22) 

j^Jk jeJk j&Jk 

Rewriting ^d^W^d = —^d^Wpd + ddWpd in L, scooping d in the latter 
summand and v, these terms vanish according to (1221) . Now, expressing d in 
(l22)) and inserting it into L yield the desired form of the dual objective function. 

Since the primal problem is convex and (because of the assumption F(xk) < 
0) strictly feasible, strong duality holds due to Boyd & Vandenberghe [^, 
Section 5.2.3]. Therefore the optimal primal and dual objective function values 
coincide and we can express Vk using this equality. Using (I22I1 . the optimal¬ 
ity conditions for the QCQP (I^Dl) and straightforward calculations yield the 
desired formulas for Vk- d 


3.2 Presentation of the algorithm 

The method described in Algorithm|3|works according to the following scheme: 
After choosing a strictly feasible starting point xi € K" and setting up a 
few positive definite matrices, we compute the localized approximation errors. 
Then we solve a convex QCQP to determine the search direction, where the 
quadratic constraints of the QCQP serve to obtain preferably feasible points 
that yield a good descent. After computing the aggregated data and the pre¬ 
dicted descent as well as testing the termination criterion, we perform a line 
search (s. Algorithm [4]) on the ray given by the search direction. This yields 
a trial point yt+i that has the following property: Either yt+i is strictly fea¬ 
sible and the objective function achieves sufficient descent (serious step) or 
yk+i is strictly feasible and the model of the objective function changes suffi¬ 
ciently (null step with respect to the objective function) or yk+i is not strictly 
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feasible and the model of the constraint changes sufficiently (null step with 
respect to the constraint). Afterwards we update the iteration point Xk+i and 
the information stored in the bundle. Now, we repeat this procedure until the 
termination criterion is satisfied. 

Algorithm 3. 0. Initialization: Choose the following parameters, which will 
not be changed during the algorithm: 

Table 1: Initial parameters 


General 

Default 

Description 

xi G K” 


Strictly feasible initial point 

2/1 = xi 


Initial trial point 

£ > 0 


Final optimality tolerance 

M>2 

M = n + ‘i 

Maximal bundle dimension 

to e (0,1) 

^ —1 

o 

o 

o 

II 

Initial lower bound for step size 
of serious step in line search 

to G (0,1) 

k = 0.001 

Scaling parameter for tg 

mL G (0, i) 

mp = 0.01 

Descent parameter for serious step in line 
search 

mn G (tol, 1), 

mp = 0.5 

Parameter for change of model of objective 

mf G [0,1] 


function for short serious and null steps in 
line search 

mp G (0,1) 

mp = 0.01 

Parameter for change of model of constraint 
for short serious and null steps in line 
search 

C € (Oj i) 

C = 0.01 

Coefficient for interpolation in line search 

d > 1 

d= 1 

Exponent for interpolation in line search 

Cs>0 

Cs = 10*5° 

Upper bound of the distance between Xk and 

Gg > 0 

Cg = 10*5° 

Vk 

Upper bound of the norm of the damped 
matrices {pjGj} (\pjGj \ < Cg) 

Cg > 0 

II 

‘O 

Upper bound of the norm of the damped 
matrices {pjGj} (\pjGj \ < Gg) 

Cg > 0 

o 

II 

Upper bound of the norm of the matrices 



{G^^} and {G^-} (hiax(|G^:|, |G'=|) < Cg) 

Gg > 0 

II 

Upper bound of the norm of the matrices 
{4^} and {6^=} (max(|4^|, |(5'=|) < Cg) 

ip>0 

ip = 3 

Selection parameter for pk+i (cf Remark 

\E) 

ii>0 


Line search selection parameter (cf. Re- 
mark\^ 

^ 0 


Matrix selection parameter (cf. Remark\^ 

ir > 0 


Bundle reset parameter (cf. Remark\^ 

71 > 0 

7i = 1 

Coefficient for locality measure for objective 
function 
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Table 1; Initial parameters (continued) 


General 

Default 

Description 

72 > 0 

72 = 1 

Coefficient for locality measure for con¬ 
straint 

Wi > 1 

uj\ = 2 

Exponent for locality measure for objective 
function 

W2 > 1 

W2 = 2 

Exponent for locality measure for constraint 


Set the initial values of the data which gets changed during the algorithm: 


in = 0 {ff subsequent null and short steps) 

*s = 0 (# subsequent serious steps) 

Ji = {1} (set of bundle indices) . 

Compute the following information at the initial trial point 

fp = fl = fiVi) (23) 

9p = gl = 5 ( 2 / 1 ) e dfiyi) (24) 

Gp = Gl approximating G{yi) G d^f{yi) (25) 

Fp = = F(yi) <0 (yi is strictly feasible according to assumption) 

(26) 

5p = 5i = 5 ( 51 ) e dF(yi) (27) 

Gp = Gl approximating G(yi) G d‘^F(yi) (28) 

and set 


Sp = Sp = sj = 0 (locality measure) (29) 

5i = pi = 1 (damping parameter) 

= 1 (Lagrange multiplier for optimality condition) 
k = 1 (iterator) . 

1. Determination of the matrices for the QCQP: 

if (step k — 1 and k — 2 were serious steps) A (^^Zi = 1 V ig > ir ) 

bundle reset 

W = Gk + k^Gk (30) 


else 

W = G^ + R^Gl (31) 

end 


if in ^ im F ii 

Wp = “positive definite modification of W” 
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else 


(32) 

end 

if in < im + ii (i-S- # of subsequent null and short steps < the fixed 
number im + ii) 

(G^, G^) = “positive definite modification of (G^, G^)” 

{Gj, Gj) = “positive definite modification of {Gj, Gjf' for all j G Jk 

(33) 

else if iji — “f ii 


(G^, G^) = “positive definite modification of (Gj), G!))” 

(G^^ G^^) = (G^ G^^) for all j e Jk 

else (i.e. at least im + ii subsequent null and short steps were executed) 

(G^(5'=) = (G'^-\(5'=-1) , {G),d)) = (G''-\(5'=-1) for all j G Ju (35) 

end 

2. Computation of the localized approximation errors: 

a’; :=max(|/(a;fe)-/;=|,7i(s,")"0 > := max (|/(xfe) -7i(s^)"0 

(36) 

A') ■= max {\F{xk) - |, 72 ( 3 ^^)'^=) , := max {\F{xk) - |, 72 ( 5 ^)“^") • 

(37) 

3. Determination of the search direction: Compute the solution {dk,Vk) G 
l^n+i qJ ffig (convex) QCQP 

mini) + \d^W^d , 

d,v ^ 

s.t. — of) F d^g) + \d^G)d < v for j G Jk 

— ttp + df^Pp + \(FG^d < V ii is < ir (38) 

F{xk) - A) + d^g) + \dFd)d < 0 for j G Jk 
F{xk) -A^+ df^gl + \d^&d <0 iiis<ir 
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and its corresponding Lagrange multiplier 


.Mp) 


i, 2 (|Jfc| + l) 
^>0 ^ 


i.e. 


dk = -Hi ( ^ }^g^ + A^5p + E + t^p9p) (39) 

jeJk j&Jk 

Vk = -dlW^dk - E + 9p^’')dk - E 

jeJk jeJk 

- Xlal - E - 4^1 - ( E + ^p) ( - ’ (40) 

j&Jk j&Jk 

where 

Hk := {W^ + E + n)G] + . (41) 

J&Jk 
Set 

■■= J2 

j&jk 


for > 0 

o” for k'=+i = 0 


(42) 


"If G ^ ’Ir 

is = 0 (bundle reset) 
end 

4- Aggregation: We set for the aggregation of information of the objective 
function 


{~9pJ~p 


^fc+1 

Lxp , 


j^Jk 


^3 ’ 


s,") + A^(5p^/, 


f-ik k\ 
^pi ^p) 


= max(|/(a;fc) - /p |, 7i(Sp)‘^0 


(43) 

(44) 


and for the aggregation of information of the constraint 


{llYY gY^'sI) = E pA^Sj) + AatFtGtY) 


• ''J 

j^Jk 


Al = va‘Ay.{\F{xk) - 


(45) 

(46) 


and we set 

Vk = -dlW^dk - YU T. + ApG'^ + pA + PpAdk 

j&Jk 

-al- -Y+^AI - «'=+! ( - F{xk)) (47) 

Wk = \\Hk{Yp+Y^Al)?+Y + ~^''^"^A~^''^\-FYk)) ■ (48) 


5. Termination criterion: 
if Wk<e 

stop 


end 
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6. Line search: We compute step sizes 0 < < 1 and tg € (0,io] by 

using the line search described in Algorithm^ and we set 

Xk+i = Xk + t’^dk (is created strictly feasible by the line search) (49) 

yk+i =Xk + t'^Rdk (50) 

fk+i = f{yk+i)) , 5 fc+i = g{.yk+i) e df{yk+i)) , 

Gfc +1 approximating G{yk+i) S d'^f{yk+i) (51) 

i^fc+i = F{yk+i) , gk+i = giVk+i) e dF{yk+i) , 

Gk+i approximating G{yk+i) € d'^F{yk+i) . 

7. Update: 

"tf in ^ ip 

Pk +1 = min(l, 

else 

Pk+l = 0 

end 
We set 

p,+i=min(l,^) 

> (serious step) 

Zyi = 0 

is = *s + 1 

else (no serious step, i.e. null or short step) 

'In — 'In F 1 (54) 


(52) 


(53) 


end 


Compute the updates of the locality measure 


3 

„fe+i 


p 

,fc+i 

p 


Sj + |x/e+i - Xk\ for j G Jk 

(55) 

l^fc+i yk-\-i 1 

(56) 

Sp + \Xk-\-l - Xk\ 

(57) 

Sp |xfc-(_i Xk\ ■ 

(58) 


Compute the updates for the objective function approximation 

= fj + gj'^{xk+i - Xk) + ^pj{xk+i - Xk)'^Gj{xk+i - Xk) for j G Jk 

fk+l ~ fk+1 + gk+liXk+1 J/fc+l) + Vk+l) Gk+l{Xk+l Pk+l) 

(59) 

= fp + gp^{xk+i - Xk) + ^{xk+i - Xk)'^Gp+^{xk+i - Xk) (60) 
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and for the constraint 

= Fj + gj'^{xk+i - Xk) + ^pj{xk+i - Xk)'^Gj{xk+i - Xk) for j S Jk 

“ Fk-\-l “t" Qk+li^k+l y/c+l) ~t“ 2Pk+l{^k+l Vk+l) Gk+l{Xk+l Uk+l) 

(61) 

Fp+^ = Fp + gp'^{xk+i - Xk) + ^{xk+i - Xk)'^Gp'^^{xk+i - Xk) ■ (62) 

Compute the updates for the subgradient of the objective function approxi¬ 
mation 


+ PjGj{xk+i - Xk) for j G Jk 
9k+l ~ 9k+l fo Pk-ClGk-Clixk-Cl Vk-Cl) (®^) 

=9p+ Gp+^{xk+i - Xk) (64) 

and for the constraint 

+ PjGj{xk+i - Xk) for j G Jk (65) 

9k+l ~ 9k+l 4” Pk+lGk-\-l{Xk-\-l Vk+l) (^^) 

g^p+^ = ~g^p+G^p+\xk+i-Xk) . (67) 


Choose Jk+i C {k — M + 2,..., fc + 1} fl {1, 2,... } with fc + 1 G Jfe+i. 
k = k + 1 
Co to 1 

Remark 6 We will see later that for convergence the approximation of element 
in d‘^f{y) and d^F{y) only needs to satisfy mild conditions. The speed of 
convergence will, of course, be influenced by the quality of ap pro ximation. 
In our first implementation of the method Fendl & SCHICHL we have 
computed elements of the respective sets, but update methods similar to L- 
BFGS are also conceivable. 

Like in the original unconstrained bundle-Newton method by LuKSAN & 
Vlcek [dH ]. the parameters im and ir as well as the additional parameter ii are 
only needed for proving convergence. Since in practice we usually terminate an 
algorithm, if a maximal number of iterations Nit_max is exceeded, we always 
choose im = in = H = Nit_max+1 in our implementation of Algorithm [31 The 
case distinction for the choice of W according to (1301) resp. m is only nec¬ 
essary for showing the superlinear convergence of the original unconstrained 
bundle-Newton method for strongly con vex, twice times continuously differen¬ 
tiable functions (cf. LuKSAN & Vlcek (dsL p. 385, Section 4]). As the choice 
fp = 3 (cf. the initialization of Algorithm [3]) for the case distinction < ip for 
Pk+i from (1521) is due to empirical observations in the original unconstrained 
bundle-Newton method (cf. LuKSAN & Vlcek (i^ . p. 378]), the fact that we 
make no case distinction for pk+i from (1531) was also found out numerically. 
A numerically meaningful choice of the matrices G’f, G^ and G^ that occur in 
(133)) is discussed in Fendl & Schichl M- 
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Proposition 7 We have for all k > 0 

= dl{W^ + A'G,' + ApG'= + ^ 

jeJk jeJk 

( 68 ) 

Wk = -^dlWpdk - Vk ■ ( 69 ) 

Proof. Because of = W!f + + ^pG^tJ-jG'- + g^G^ due to (HTl) 

and dk = —H^{gp+k^^^gp) due to (l!^ . (1^ . (HHl) and (H5l) . easy calculations 
yield (IMl) . Furthermore, ((^ holds due to (HRl) . (lilRl) . and (ITfll . □ 

Remark 7 If we consider a nonsmooth unconstrained optimization problem 
(i.e. we drop the constraint F(x) < 0 in optimization problem Q) and if we 
choose G^ = 0 , then our formula for Vk from (07]) reduces to the formula for 
Vk in the unconstrained bundle-Newton method (cf. LuKSAN & Vlcek (d^ . 
p. 377, formula (13)]), since Vk = —\Hkgp\^ — due to (iFTll and (l68ll . 


3.3 Presentation of the line search 

We extend the line search of the bundle-Newton method for nonsmooth uncon¬ 
strained minimization to the constrained case in the line search described in 
Algorithm [4l For obtaining a clear arrangement of the line search, we compute 
data concerning the objective function in ComputeObjectiveData and data 
concerning the constraint in ComputeConstraintData. Before formulating the 
line search in detail, we give a brief overview of its functionality: 

Starting with the step size t = 1, we check if the point Xk + tdk is strictly 
feasible. If so and if additionally the objective function decreases sufficiently 
in this point and t is not too small, then we take Xk + tdk as new iteration 
point in Algorithmic] (serious step). Otherwise, if the point Xk +tdk is strictly 
feasible and the model of the objective function changes sufficiently, we take 
Xk -b tdk as new trial point (short/null step with respect to the objective 
function). If Xk -b tdk is not strictly feasible, but the model of the constraint 
changes sufficiently (in particular here the quadratic approximation of the 
constraint comes into play), we take Xk + tdk as new trial point (short/null 
step with respect to the constraint). After choosing a new step size t G ]0, 1 ] 
by interpolation, we iterate this procedure. 

Algorithm 4 (Line search). 0. Initialization: Choose ( G (0, as well as 
d > 1 and set tr = 0 as well as t = tu = 1. 

1. Modification of either t^ or tjj: 

tf F{xk -b tdk) < 0 

if f{xk + tdk) < f{xk) + ruLVk ■ t 
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th = t 

else if f{xk + tdk) > f{,Xk) + rriLVu ■ t 
tu =t 

end 

else if F{xk + tdk) > 0 
tjj = t 

to = iotu (70) 

end 

-tf tL > to 
tR = ti 

return (serious step) 
end 

2. Decision of return 

if in < k 

if F{xk + tdk) < 0 

[g,G, ...] = ComputeObjectiveData(t, . . . ) 
if Z = true 

tR = t 

return (short/null step: change of model of the objective function) 
end 

else if F{xk + tdk) > 0 

[g,G, ...] = ComputeConstraintDataCt, . . . ) 
if Z = true 

tR = t 

return (short/null step: change of model of the constraint) 
end 
end 

else if in > ii 

[g,G, ...] = ComputeObjectiveDatad, . . . ) 
if F{xk + tdk) < 0 and Z = true 

tR = t 

return (short/null step: change of model of the objective function) 
end 
end 


3. Interpolation: Choose t G [tr + C{tu — tL)^,tu — ({tu — ti)’^]- 
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4- Loop: Go to 1 

function [g,G,...] =ComputeObjectiveData(t, . . . ) 

9 = g{xk + tdk) G df{xk + tdk) 

G = approximation of G{xk + tdk) G d^f{xk + tdk) 

_ f min(l, for < 3 

^ \ 0 else 

/ = f{xk + tdk) + (tL - t)g'^dk + \p{tL - tfd^Gdk (71) 

/3 = u\a^{\f{xk+tLdk) - /|,7i|^i - t\^^\dk\^^) (72) 

G = “positive definite modification of G” (73) 

Z = -jd + dl[g + p{tL - t)Gdk) > mRVk + nif ■ (-\dlGdk) 

and {t-tL)\dk\<Gs (74) 

function [g,G,.. . \ =ComputeConstraintData(t, . . . ) 

9 = g{xk + tdk) G dF{xk + tdk) 

G = approximation of G{xk + tdk) G d^F{xk + tdk) 
p = min(l,^) 

F = F{xk + tdk) + {tL - t)g^dk + \p{tL - t)'^dlGdk (75) 

/3 = max(|F(xfc +^^4) - F\,^ 2 \tL - (76) 

G = “positive definite modification of G” (77) 

Z = F{xk + trdk) - P + dl{g + p{tL - t)Gdk) > mp ■ {-^d^Cdk) 

and {t-tL)\dk\<Gs (78) 

Remark 8 The parameter ii is only necessary for proving global convergence of 
Algorithm [3] (to be more precise, it is only needed to show that a short or null 
step which changes the model of the objective function is executed in Lemma 
[MU. If we choose ii = 0, then only a change of the model of the objective 
function yields a short or null step. In fact we have ii steps in Algorithm [3] 
in which we can use any meaningful criterion for terminating the line search 
(even for the unconstrained case as it is partially done in the implementation 
of the original unconstrained bundle-Newton method anyway). 

(1701) is due to the following observation: Consider the line search (Algorithm 
a without dZOl) (i.e. to is fixed, e.g., to ■= 0.5 G (0,1), where this large, but 
legal value for to is only chosen to obtain a better graphical illustration in 
Figured])- It can happen (in particular) at the beginning of Algorithm [3| that 
the search direction dk is bad as we have no knowledge on the behavior of 
/ and F yet. Consequently, the following situation can occur: The model of 
the objective function / does not change (e.g., if / is linear on Xk + tdk with 
t G [0,1]), and there are no step sizes t > to which yield feasible Xk + tdk (this 
is in particular possible, if we are near the boundary of the feasible set). 
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Fig. 1: Line search with fixed to 

In this situation the line search will not terminate for fixed ip (lii particular in 
the case in < ii the model of F does not need to even satisfy (1751) for infeasible 
Xk + tdk)- Therefore, we need to decrease to to have at least one feasible step 
in the line search for which a descent of / is enough for terminating the line 
search (similar to the unconstrained case). As the convergence analysis will 
show, this must not be done too often (cf. (11431) and Remark ITUl) . Because we 
use the quadratic terms in the constraint approximation to obtain as much 
feasibility as possible on the search path t ^ Xk + tdk with t € [0,1] (cf. the 
idea that leads to the QCQP (17(11) 1. we expect that this should be true. Indeed, 
in practice tg turns out to be only modified at the beginning of Algorithm [3] at 
least many examples of the Hock-Schittkowski collection by Schittkowski 
[H, [Hi (cf. FeNDL & SCHICHL H). In particular, if F{xk + tdk) < 0 for 
all t S [0,1] (e.g., if F is constant and negative on R” which in fact yields 
an unconstrained optimization problem), the case (1701) will never occur and 
therefore to will not get changed (this is the reason why tg is constant in the 
bundle-Newton method for nonsmooth unconstrained minimization). 

The step sizes which the line search returns correspond to the points Xk+i = 
Xk + t’ldk and yk+i =Xk+ tdk = Xk + t\dk. 

Only strictly feasible iteration points are accepted in the line search 

F{xk+t’ldk)<Q . (79) 

Nevertheless, trial points may be infeasible (if fy < q). 


Proposition 8 Let 






At 


it At 


itA 


k 

P 


jeJk jeJk 

Wk '■= \\Hk{gp + + cip + Ap + ( — F{xk)) 

(Note: Wk is the optimal function value of the dual problem (1211) ). 
have at iteration k of Algorithm [3] 


( 80 ) 

( 81 ) 

Then we 


Vk < Vk < 0 < Wk < Wk ■ 


(82) 
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Proof. For 7 > 0 and a; > 1 the functions ^ 1 —>■ 7 |^|“ and (■^ 1 ,^ 2 ) max (^ 1 , ^ 2 ) 
are convex and therefore we have 7 and 

™ax ( J2i=i ^tVi) ^ Y.i=i max {xi,yi). Since A^’ > 0 for j € Jfc and 
Ap > 0 holds for the solution of the dual problem (I^Tl) of the QCQP (1551) . we 
have 1 = T,jeJk +^p implies f{xk) = T,jeJk fi^k) + >^pf{xk), and 

hence 5^ < dp follows from (HU) . and ([50]). If > 0, we have 1 = 

® which implies F{xk) = n’^F^Xk) + K^F{xk), 

and hence follows from 601), 63, 6ZD and (I5ni) . Consequently, 

we have < R^^^A^ for > 0 , which yields 0 < dp + R^^^A’f < 

dp + K^+^Ap due to (I44L (|42|1 and (l46)l . Now, we obtain the Wfe-estimate 
of (1551) due to 63, and 63 ■ Because of (1501) and (HOI) we have 0 > 
-d^p - - X^pa^p - - y^A^, and, therefore, 

we obtain the Ufc-estimate of (1501) by using (I47L 63, 63, 63, the positive 
definiteness of bFp and (I501l . □ 

Proposition 9 If the line search is entered at iteration k of Algorithm^ then 

Vk <0 ■ (83) 

Furthermore, if there occurs a step size t with F{xk + tdk) > 0 in the line 
search, then 

-i44.+td.4 < 0 . (84) 

Proof. If the line search is entered at iteration k (cf. step 6 of Algorithm |2|) , 
then no termination occurred at step 5 of Algorithm [5] at iteration k, and 
therefore we have Wk > 0, which yields (l83ll due to (l69ll and the positive 
definiteness of bFp . 

Now we show ((501) by deducing a contradiction: Suppose (1501) does not hold, 

1. e. dk = 0 due to 63- Then, since all iteration points Xk are strictly feasible 

due to 63, we obtain F{xk + tdk) = F{xk) < 0, which is a conradiction to 

the assumption F{xk + tdk) >0. □ 

Proposition 10 1. If the line search (Algorithm [0]) terminates with condi¬ 
tion (63, then the old search direction dk and the old predicted descent Vk 
(of iteration k) are sufficiently infeasible for the new QCQP (1381) (at itera¬ 
tion k + 1) in Algorithm^ (i.e. the old search direction dk cannot occur as 
search direction at iteration k + \ and therefore we obtain a different search 
direction at iteration k + 1 and consequently a “meaningful extension of the 
bundle”. 

2. If the line search (Algorithm\^ terminates with condition (I78II . then the 
old search direction dk (of iteration k) is sufficiently infeasible for the new 
QCQP (1381) (at iteration k F \) in Algorithm^ (i.e. using a QCQP also 
yields a “meaningful extension of the bundle” in the constrained case). 

3. The condition [t — tk)\dk \ < Cg in (1781) resp. (1741) corresponds to 


\yk+i - Xfe+il < Cs ■ 


(85) 



Title Suppressed Due to Excessive Length 


25 


Proof. Because of / = + ^L^fe) due to ([711), dMl) and 

(fTn)l as well as P = due to (I77|l . (17^ and ((HHl) . we obtain + 

dlgllVtdt'’ >mRVk + mf{-ldkG:^ ^+tdkdk) by using dTH) . (1551) and (fTTl) . Due 
to the initialization of Algorithm^ we have 0 < mu < 1 and 0 < to/ < 1. Now, 
dMl) resp. dTS]) imply TO_RUfc > Vk a,ndmf{-^dkGx^+tdkdk) > -^dkG^^+td^dk- 
Since the line search (Algorithm U) terminates with condition (175)) due to 
assumption, we obtain that dk is sufficiently infeasible for the new QCQP (1551) 
(with respect to the approximation of the objective function) at iteration fe +1 
due to (1551) . 

Because of F = + t^dk) due to ([75]), (HD) and (flUl) 

as well as /3 = due to (1751) . (1551) and ((571) . we obtain F{xk + t^dk) — 

KlVtdt’’ Fdlgl'lXldt’^ ^ rnp ■ {-\dkG^^+tdkdk) by using (EH]), dM]) and (HD- 
Due to the initialization of Algorithm [51 we have 0 < tof < 1- Now, ((551) 

implies mp ■ {-^dkG^^+td^dk) > -^dkG^^+td^dk- Since the line search (Algo¬ 
rithmic terminates with condition (I78|) due to assumption, we obtain that dk 
is sufficiently infeasible for the new QCQP (1551) (with respect to the approxi¬ 
mation of the constraint) at iteration A: + 1. 

(1551) follows from (1751) and (1751) . □ 


4 Convergence 

In the following section we prove the convergence of the line search and we 
show the global convergence of the algorithm. 


4.1 Convergence of the line search 

For proving the convergence of the line search (Algorithmic we have to identify 
a large subclass of locally Lipschitz continuous functions, which is the class 
of weakly upper semismooth functions (that contains, e.g., functions that are 
the pointwise maximum of finitely many continuously differentiable functions 
due to Mifflin jH3, p. 963, Theorem 2]). 

Definition 4 A locally Lipschitz continuous function / : — >■ R is called 

weakly upper semismooth, if 

limsup gf d > lim inf 

z—j-cso i^oo 

holds for all x G , d £ R'^, {gi}i C R^ with gi £ df{x+tid) and {L}* C R+ 
with ti 0. 

Proposition 11 Let f : R'^ — > R 6e weakly upper semismooth, then the line 
search (Algorithm^ terminates after finitely many steps with Pf = tp, t^ = t 
and tg > 0. 
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Proof. If F{xk + tdk) < 0 for all t G [0,1], then this is exactly the same 
situation as in the line search of the unconstrained bundle-Newton method 
which terminates after hnitely many iterations due to LuKSAN & Vlcek (d^ . 
p. 379, Proof of Lemma 2.3]. Otherwise, since F is continuous and F(xk) < 0, 
there exists a largest f > 0 with F{xk + dki) = 0 and F{xk + dks) < 0 
for all s ^ t. d'herefore, after sufhciently many iterations m the line search 
(Algorithm U) (Note that the interval [tL,tu] is shrinking at each iteration 
of the line search), there only occur tL,to,tu with 0 < < tjj < t and 

0 < to < tjj < t (i.e. from now on all Xk + tdk with t € ^'I’e feasible) 

and consequently to (where Xk+todk is also feasible,) does not change anymore 
(cf. (1701) 1. Hence, here we also have exactly the same situation as in the line 
search of the unconstrained bundle-Newton method, which terminates after 
hnitely many iterations due to LuKSAN & Vlcek |45l . Proof of Lemma 2.3], 
where the only difference in the proof is that we need to use the following 
additional argument to obtain the inequality at the bottom of LuKSAN & 
Vlcek (dsL p. 379]: Since m/ G [0,1] due to the initialization of Algorithm 
[3] and since G is positive dehnite due to (1731) , the negation of the condition 
in (1741) that corresponds to the change of the model of the objective function 
yields -/3 + df (g + p{tL - t)Gdk) < rriRVk + ruf ■ {-^dlGdk) < rriRVk- □ 

Remark 9 The proof of Proposition ITT] only relies on / satisfying (I3H1) . the 
continuity of F and the strict feasibility of Xk ■ In particular, F does not need 
to be weakly upper semismooth. 


4.2 Global convergence 

For investigating the global convergence of Algorithm [3] we will follow closely 
the proof of global convergence of the bundle-Newton method for nonsmooth 
unconstrained minimization in LuKSAN & Vlcek (isL p. 380-385, Section 3] 
with modifications which concern the constrained case and the use of determin¬ 
ing the search direction by solving a QCQP, where we will work out everything 
in great detail so that it is easy to see which passages of the proof are similar 
to the unconstrained case resp. which passages require a careful examination. 
Therefore, we assume 

e = 0 , = 0 for all j ^ Jk , dj=0 for rtH j ^ Jk ■ (87) 

A main difference to the proof of convergence of the unconstrained bundle- 
Newton method is that here Flk from (HTl) depends on the Lagrange multipliers 
(A^, Ap, jjf, Up) of the QCQP (1331) . which implies that so do the search direction 
dk from (1391) (and consequently the new iteration point Xk+i from (1491) as well 
as the new trial point yk+i from (I50|l 'l and the termination criterion Wk from 
(135)1 in particular. Furthermore, this dependence does not allow us to achieve 
the equality Hk+i = Hk in the proof of Theorem [5] in contrast to LuKSAN 
& Vlcek (dR top of page 385, Proof of Theorem 3.8], which extends the 
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complexity of the already quite involved proof of the unconstrained bundle- 
Newton method. 

Hence we give a brief overview of the main steps of the proof: In Proposition 
[12] we express the p-tilde data (as, e.g., g ^,..as convex combinations in 

which no p-data (as, e.g., g^,...) occurs. Afterwards we recall a sufficient 

condition to identify a vector as an element of the subdifferential in Proposition 
[T3| In Theorem we show that if Algorithm |3| stops at iteration k, then 
the current iteration point Xk is stationary for the optimization problem ((^ . 
From then on on we assume that the algorithm does not terminate (cf. (1951) 1. 
After summarizing some properties of positive definite matrices, we deduce 
bounds for and {W^ + G'^ + in Corollary H which will 

be essential in the following. Then, in Proposition [1^1 we show that if some 
boundedness assumptions are satisfied and the limit inferior of the sequence 
{max(wfc, \xk — a;|)} is zero, where x denotes any accumulation point of the 
sequence of iteration points {xfc}, then x is stationary for the optimization 
problem (I5|), where the proof relies on Caratheodory’s theorem as well as on 
the local boundedness and the upper semicontinuity of the subdifferentials 
df and dF. Due to the negativity of Vk, which holds due to (15^ . we obtain 
the statement t^Vk —^ 0 in Proposition [T71 In Proposition [TH| we show some 
properties of the shifted sequences {xk+i}, {wk+i} and where we have 

to take care of the dependence of (A^', A^, /r^), which we noticed before, in 

the proof. Then we recall an estimation of a certain quadratic function on the 
interval [0,1] in Proposition [THl After recalling the differentiability of matrix 
valued functions to give a formula for the derivative of the matrix square root 
in Proposition and after formulating the mean value theorem for vector 
valued functions on a convex set in Proposition [521 we combine these two 
results to obtain a Lipschitz estimate for the inverse matrix square root in 
Proposition 1231 which serves as replacement for the property Hk+i = Hk of 
the proof of the unconstrained bundle-Newton method as mentioned above. 
Finally, we prove that under some additional boundedness assumptions the 
limit inferior of the sequence {max {wk-, \xk — a;|)} is always zero and therefore 
Proposition |Tni yields Theorem |5| which states that each accumulation point 
X of the sequence of iteration points {xk} is stationary for the optimization 
problem ([2|). 

Proposition 12 Assume that Algorithm |2| has not stopped before iteration k 
with k > 1. Then there exists A^ G R for j = 1,... ,k with 

k k 

A)'>0, 1 = E^^ • (88) 

j=i i=i 

If > 0, then there exists € R for j = 1 ,... ,k with 


k]> 


k 

E- 

1=1 


(G^+i 


9p: 


i=i 


i/j J 


(89) 
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If = 0, then (IRiH) holds with 


kj := 0 for all j = 1,..., fc . 


(90) 


Proof, (by induction) Since gp = gl due to (IMll as well as = a\ due to 
(IM)) . (l2^ and ((29l) . as well as g^ = g\ due to (l27ll as well as Ap = A\ due to 

(I57)) . (1^ and (1^ . as well as = G\ and = G) due to (1551) . (IMl) and 
(ESI) resp. (ESI)) the aggregated (p-)constraint of the QCQP (IITRI) at iteration 
fc = 1 of Algorithm [3] coincides with the corresponding bundle constraint, and 
therefore we can drop the aggregated (p-) constraint and consequently the dual 
problem m has only two variables A) and /i), where A) = 1 must hold, so 
that the equality constraint of the dual problem (1311) is satisfied. Now, if we 
set Ap = 0 and = 0, then the dual solution does not change. 

Consequently, (IMll holds due to the same calculations which are performed 
in Luksan & Vlcek [H, Lemma 3.1]. 

Furthermore, we obtain k^ = due to (H31l and therefore we get k] = 1 
for k^ > 0 and k] = 0 for = 0 as well as Kp = 0. Summarizing these 
facts yield that we have at iteration fc = 1 of Algorithm [3] that 
1 for > 0' 


0 for = 0 


and = 0. 


1 for k^~^^ > 0 

Therefore, the base case is satisfied for fc = 1 with k? := s „ ^ 

^ I 0 for = 0 


since (1531) holds due to (1351) . 

Let the induction hypothesis be satisfied (i.e. we have > 0 in partic¬ 
ular) and define 


A+i ._ 


+ Kp'^^kj for > 0 

for = 0 




for j = 1,..., fc 


A+l ._ ^k+l 


'‘k+1 


:= K 


Tc+l ’ 


(91) 

where is part of the solution of the dual problem (ET|) (including 


the aggregated terms) and Kj resp. are set according to (1331) . The 

case = 0 is equivalent to = Kp = 0 for all j = l,...,fc due to 

(133)) and therefore we obtain g^ = 0 and G^^^ = 0 due to (1351) . which implies 
= 0 due to (l67l) . Hence, at iteration fc-|-l in the QCQP (1381) the aggregated 
constraint for F reads in the case ig < ir F{xk+i) — < 0. Since this 

inequality is sharp due to (1731) and (1371) . the aggregated constraint for F is 
inactive at iteration fc -|- 1. Since Lagrange multipliers for inactive constraints 
vanish, we obtain at iteration fc-l-1 (Note that gp'^^ is the Lagrange multiplier 
corresponding to the aggregated constraint for F at iteration fc -|- 1 and note 
that > 0 is the assumption for what we want to show by the inductive 
step k 1 -^ k + 1) = 0 which implies 


..fc+i 


= 0 


fe-i-i 

:E«r‘ 

i=i 


= 1 A 


{k']+^ >0 for all j = 1,..., fc + 1)) (92) 
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due to > 0 and (H^ . In the case is > ir holds anyway, since then 
in the dual problem (ED for the QCQP the aggregated constraints do 
not occur and therefore the corresponding Lagrange multiplier can be set to 
zero. So, the inductive step fc i—fc + 1 for the first two properties of (15^ holds 
in the case > 0 due to ED, (SD and (I^Tl) (Note that we assumed that 
we consider the case > 0 which implies that we can use the induction 

hypothesis for the first two properties of (IHiHl and note that we have > 0, 
since this is the assumption for what we want to show by the inductive step 
/c I—>■ fc + 1) and in the case = 0 due to (IHTl) and ED- The inductive step 
for the third property of (15^ holds in the case > 0 due to (H51) and ED, 
and in the case = 0 due to (|45L (1921) and ED- The inductive step for 
the fourth property of (IMll holds in the case > 0 due to (HHll . (|H7|) . (EHl) 
and ED and in the case = 0 due to (03, (ED and ED- The inductive 
step for the fifth property of (l89ll holds in the case > 0 due to (l45]l . (Ibp . 
(155)) and (ED, in the case = 0 due to (H51) . (IMl) and (1511) . 

In the case = 0 we obtain Kj=Q for all j = 1,..., fc and = 0 due 
to (1421) and therefore (1891) holds due to (1451) and ED- □ 

Proposition 13 If x G and there exists Gj G , q,yj G M", gj G 

df{yj), Sj, Xj G M for j = 1,..., L, where L > 1, with 

L L 

iQ,0) = J 2 is 3 +Gj{x-yj),Sj)Xj , l = ^Aj, > 0 , \yj - x\ < sj , 
i=i j=i 

for all i = 1,..., L, then q G df{x). 

Proof. Luksan & Vlcek (isl . p. 381, Proof of Lemma 3.2], □ 

Theorem 5 If Algorithm [3] stops at iteration k, then there exists > 0 

such that m holds for i.e. Xk is stationary for the optimization 

problem ED- 

Proof. Since Algorithm |3] stops at iteration k, step 5 of the algorithm, (l87l) 
and (1821) imply Wk = 0 which is equivalent to 

l\Hk{g^ + R'^+^~g^)\^ = 0 A = 0 A = 0 A r’^+\-F{ xk)) = 0 

(93) 

due to (H5)) . (HD, > 0 nnd F{xk) < 0. Using the regularity of Hk, (011) 
and (I43L we obtain from (1551) 

~g^^+R>^+^^^=0 , 5^ = 0. (94) 

Furthermore, for R^'^^ > 0 we obtain from ED, (1551) and ((151) that Sp=0 and 
hence we have either = 0 or >0 A = 0. 

We set X := Xk, L := k, yj := yj, Sj := s^. Then for Gj := PjGj^ gj := gj, 
Xj := Vf, and q := g^ resp. for > 0, G' := PjGj, g'j := gj, Xj := k’f, and 
q' := cjp the assumptions of Proposition [T5] are satisfied (by using reposition 
irp . and therefore we obtain g^ G df{xk) and G dF{xk). Now, using ((Ml) 
we calculate 0 G df{xk) + R^^^dF{xk). □ 
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From now on, we demand that Algorithm |3] does not stop, i.e. according 
to step 5 of Algorithm [3] and (IH71) we have for all k 

Wk > 0 . (95) 

We summarize some properties of positive (semi)definite matrices. 
Proposition 14 Let A,Bg with B positive semidefinite, then 

A^A + B. (96) 


If A and B are even positive definite, then 


\A^ -B2\ < 


(■^min (A)) 2 +(An:iin(-S)) ^ 


t\A-B\ 


and if additionally A < B holds, then 


(97) 


\B-^\ < |A-i| . (98) 

Proof. (IM)) is clear. (lilTl) holds due to Higham p. 135, Theorem 6.2]. 
Since B is positive definite due to assumption, B~^ is positive definite and 
since all eigenvalues of a positive definite matrix are positive, we obtain (I98|) 
due to the fact that A < B <;=^ B~^ ^ A~^ (cf. Horn & Johnson |27l . 
p. 471, Corollary 7.7.4(a)]), the fact that A < B implies \i{A) < Xi{B) for 
all f = 1,...,A^ (cf. Horn & Johnson p. 471, Corollary 7.7.4(c)]) and 
®. □ 


Proposition 15 Let {A^} he a sequence of positive definite matrices Ak G 
Then 


{Ak} is bounded 


{A^} is bounded 


(99) 


and 


{Ak} is uniformly positive definite bounded . (100) 

Proof. Since Ak G is positive definite due to assumption, there exists 

an eigenvalue decomposition Ak = Qk^^kQk with Qk € R^^-^ orthogonal and 
a diagonal matrix Sk G R^^^ with positive diagonal elements and we define 

i i rp 1 

f^k •= Aniax(‘^fc)- Since ([3]) implies \Ak\ = and since = Q^S^Qk implies 
\Al I = , we obtain (IMl) . 

w follows directly from the assumption of the uniform positive definite¬ 
ness of {Ak} and ®. □ 

Corollary 2 If {{Wp)~^} is bounded, then {{Wp)~^} and {Hk} are bounded 

l(l^p)-'l<C'o (101) 

for all k >1 with some positive constant Cq > 0. 

If {k^'^^} is bounded and {{Wp)~^} is uniformly positive definite, then 
{Hjf^} is bounded and 


\W^ + G^ + < Cl , 

for all k > 1 with some positive constant Ci > 0. 


(102) 
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Proof. Since (W^)“5 = ((iVp )“^) ^ is bounded due to assumption, {(Wp)~^} 
is bounded due to ((Ml) and therefore (|1()1|1 holds with some positive constant 
Co > 0, which is equivalent to the uniform positive definiteness of {W^} due 
to UnOl). Since ^ for all > 0 with 

to (|41|) . we obtain |iJ|| < Cq due to (1981) and (IIOII) . which is equivalent that 
{Hk} is bounded due to (IMl) . 

Since is bounded due to assumption, there exists a positive constant 

Xo > 0 with < xo for all /c > 1 (note that > 0 due to 

Since {{Wp)~^} is uniformly positive definite due to assumption, {{Wp)^} is 
bounded due to (IIOOII . which is equivalent to {Wp} being bounded due to (IMll . 
i.e. |lVp I < xi for some positive constant xi > 0 and for all /c > 1. Therefore, 

we obtain the boundedness of \ < Xi + Cq + XqCg due to (HD) , and 
the initialization of Algorithmic] which is equivalent to being bounded 

due to dMl)- Furthermore, setting Ci := xi + Cg + XqCg yields (11021) due to 
(H^ and the initialization of Algorithm [d □ 

From now on let the following assumption be satisfied. 

Assumption 6. Let (IMl) be satisfied. Furthermore, let he bounded 

and assume there exists x € M.^ with ^{x) = 0, where a : — > K 

(T(a;) := liminf max (wfe, |xfc — a;|) . (103) 

k—>-oo 

Moreover, let {(Wp)~^} be uniformly positive definite. 

Next, we present Lemma (THd which we need for proving Proposition 1 171 

Lemma 1 (Convergence of basic sequences) There exist K C K C 
{1, 2,..., } and it € M such that 


Xk ^ X , Wk ^ 0 , (104) 

R , (105) 

Xk X , Wk 0 . (106) 

Proof. Since we have 0 = a'(x) = liminffc_>oo max (wfc, — a;|) due to as¬ 

sumption and (nnci) and since Wk > 0 due to ([82]), there exist convergent 
subsequences of {u'fc}fe>i and {xk — x\k>i, i.e. there exists (an infinite set) 
K C {1, 2,..., } such that (11041) holds. Since {R^'^^}k is bounded by assump¬ 
tion, all its subsequences are also bounded. Therefore, in particular, its subse¬ 
quence k^k bounded. Consequently, k^k accumulation 

point, i.e. there exists (an infinite set) K C K and it € R such that (110511 holds. 
Since > 0 for fc = 1, 2,... due to (H^ . we have it G R>o. Since K C K 
and a sequence is convergent, if and only if all of its subsequences converge 
towards the same limit, (11041) yields (I106D . □ 
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Lemma 2 (Lagrange multipliers) Let I := {1, 2,..., N+2} (Note: card(/) = 
n + 2;, S := : j = I, ■■■ ,k} C and S := : j = 

1,..., fc} C Then for i G I and k > 1 there exist A^’®, € M and 


’®,s'®’®)g 5, (g'®’®. 

s'®’®) G S such that 



II 

M 


1 = A'®’' , A'®’® > 0 . 

(107) 


iGl 

iGl 


^>1 

II 

M 





i^I iei 


(1 = ^ k'®'-® a k'®’® > 0) or (k'®’® = 0 for all i G I) . (108) 

iei 


In particular, we have 



1 if > 0 

0 if = 0 . 


(109) 


Proof. We have {gp,Sp) G ch(S') due to (IMll and ()HH)l . Due to Caratheodory’s 
theorem (cf., e.g., Neumaier [H^), for i G I and fc > 1 there exist (g^’®, s^’®) G 
S and A^’® G R such that (I107II holds. Furthermore, we have {gp,Sp) G ch(,5) 
for > 0 and (p^, Sp) = 0 for = 0 due to (IMll and (IMll . In the case 
> 0 there exist ( 5 ^®’®, s'®’®) G S, k'®’' G M for i G / with 1 = 
k'®’® > 0 and {gp,Sp) = Caratheodory’s theorem 

(cf., e.g., Neumaier [H^). In the case k'®+'^ = 0 choosing k'®’® := 0 for all 
i G I yields {gp,Sp) = 0 = X^ze/Hence, (I108|l holds, which 
immediately implies (I109II . □ 


Lemma 3 (Assignment) There exists j{k,i) G {l,...,fc} (i.e. a function 
j : {k G N : k > 1} X I — ^ {l,...,fe}; with g'®’® = s'®-' = s^(fe_.), 

Proof. Use (g'®’®, s'®’®) G S and ( 5 '®’®, s'®’®) G S for i G I and fe > 1 from Lemma 

m □ 


Lemma 4 (Trial point convergence & implications) For all i G I there 
exist Iji G R''' and (an infinite set) K 3 C K 2 C Ki C K with 

Voik.i) Vt ■ (110) 

(5j(fc.z)z5Afc.i)) G dfi.yt) X dF{yi) (111) 

(Pj(fc,z) G7j(fc,z);^ {Gi, tifj . (112) 

Proof. Since \yj(k,i)\ ^ \'^j(k,i) \ + Cs holds for all i G / and for all fc > 1 due 
to (I85|l . the assumption of the boundedness of {xk} yields that {yj(k,i)}k>i,i£i 
is bounde and therefore it has a convergent subsequence, i.e. mni) holds. 
Furthermore, the local boundedness of df resp. dF (cf. Proposition [5]) imply 
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that the sets Bi := {g € '■ yj{k,i) G k > 1, k G Ki, i G 1} and 

B 2 := {g G dF{yj(k,i)) '■ yjik,i) G k > 1, k G Ki, i G 1} are bounded. 
Therefore, Bi x B 2 is bounded and consequently there exists a convergent 
subsequence {gj(k,i),9jik,i)) G df{yj(k,i))^dF{yj(k,i)), i-e. there exists {gi,gi) G 

X and (an infinite set) K 2 C Ki with {gj(^k,i)T 9j{k,i)) i9ij9i)- The 
upper semicontinuity of df resp. OF (cf. Proposition [5]) and (IllOl) imply that 
for alH £ / null) holds. 

Since Pj{k,i) G (0, 1] due to and Cg > 0, we obtain Pj(k,i)\Gj{k,i) | < Co, 
which yields the boundedness of the sequence {pj[k,i)\Gj(k,i)\}- Due to (11071) . 
the sequence {A^’*} is bounded. Since Pj(k,i) G (0,1] due to (l5Hl) and Cq > 0, 
we obtain Pj(k,i)\Gj(k,i) \ Fi Gg, which yields the boundedness of the sequence 
{Pj{k,i)\Gj(k,i)\}- Due to l|108l) . the sequence is bounded. Therefore, the 

sequence {pj(^k,i)\Gj{k,i)\,>^''’\Pj{k,i)\Gj(^k,i)\,K^'''} is bounded. Consequently, 
there exists a convergent subsequence of {pj(k,i)\Gj(^k,i)\, Pj{k,i)\Gj(k,i)\, 

i.e. for all i £ J there exist Gi^Gi G A^ , Ki £ M and (an infinite set) 

F 3 C K 2 such that (11121) holds. □ 

Lemma 5 (Complementarity condition) We have 

'^K{gt + Gi{x-y^)) + k'^Ki{gi + Gi{x - yi)) =0 

iG/ iei 

Xk’^sk’^ ^ 0 

nk’^sk'^ 0 if K > 0 

Furthermore, the complementarity condition kF{x) = 0 holds. 

Proof. We calculate g^ Y.iei {9i + Gi{x- yt)) and g^ ^ Y.iei (ffi + 
Gi[x-yi)) by using (I107p . (11081) . Lemma[31 ([11]), (|112p . (Illip . (11061) and (IllOl) 
Since {k^+^I is bounded and {{Wp)~^} is uniformly positive definite (both 
due to assumption), Corollary H] implies the boundedness of {Fl^^}. Because 
of (I106|) . (jlHl) and (IR^ . we have \F[k{gp + k^~^^gp)\ 0, which implies (IllOl) 

due to the regularity of Ffk , mnsi) and the uniqueness of a limit and a.p 0, 
which implies (11141) due to (I44L (11071) . Lemma HJ (I107|) and (Ha, as well as 
F{xk) 0, which implies 0 = kF{x) due to (11051) . the continuity of F 
and (11061) . as well as k^+^A^ 0 which implies for k > 0 that (II151) holds 

due to (I105L (Hl^ . (I108|) . LemmaO (11081) and ([T71) . □ 


(113) 

(114) 

(115) 


Lemma 6 (SubdifFerential elements) We have 


'^Xt{gi+Gi[x-yi)) G df{x) , 
iGl 


^i{9i + Gi{x- yi)) G dF{x) if k > 0 
i&I 

{0} = RdF{x) 


if K = 0 . 
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Proof. Since (|1()7II holds for all k £ K 3 , (| 112 l) implies X^ie/~ 
to (I112L we have lim!<'„ = J 2 iei If K > 0, then — because of 

(fTOHll and since K^{g K) is an infinite set — there exists k £ such that 
l^fc+i _ ^1 ^ which implies 0 < | < for all k £ K 3 , where K 3 := 

|fc £ K 3 : fc > l:| C K 3 is an infinite set. Therefore, we obtain = 1 

for all k G K3 due to (I109L i.e. ^^’^}keK3 constant on K3 and 

hence we have lini^^ ^ ■ Since the sequence 

is convergent, the (infinite) subsequence (of the sequence 

{EzG/ keKs) converges towards f and a sequence is convergent if and 
only if all its subsequences converge towards the same limit, the limit of the 
sequence {X^zg/ must be f. Consequently, we obtatin for it > 0 that 

SiG/ “ f • 

Due to (11141) the sequence {A^’*s*^’®}fcg/G 3 is convergent and therefore neces¬ 
sarily bounded, i.e. there exists C > 0 with 0 < due to Lemma [3] as 

well as (fT7l) and therefore is bounded due to ( 11121 ) for Xt 0 , where 

at least one such Ai exists because X^iG/“ f' Since the locality measure 
is monotone due to (IT31l . {s^’^}ksK3 is monotone. Consequently, {s*^’®}fcgiC 3 is 
convergent for Xi 0, i.e. there exists Si := lim^a s^’®. Therefore, (If 141) . (11121) 
and Ai 0 imply Si = 0. Hence, we obtain for A^ 0 that \x — yi\ =0 due 
to Lemma [21 (|T3)) . (|f fOI) and (|f06|) . For k > 0 the sequence {K^’^s^’^jkeXs is 
convergent due to (If 1 5|l and therefore necessarily bounded, i.e. there exists 
C > 0 with 0 < s^’® < due to Lemma [3] as well as (113 and therefore 
{s^’®}feg_R -3 is bounded due to (If 121) for Ri ^ 0, where at least one such Ri 
exists because "Yhi^iRi = 1- Since the locality measure is monotone due to 
m and (1181) . is monotone. Consequently, {s^'^}kGK3 is convergent 

for Ri 7 ^ 0, i.e. there exists Sj := limiG 3 s^’®. Therefore, (If 151) . ([1121) and iti 7 ^ 0 
imply Si = 0. Hence, we obtain in the case k > 0 for Ri ^ 0 that \x — yi\ = 0 
due to LemmajH (fTT)) . (fT31) . (If fOI) and (II06|) . Therefore, if Ai 7 ^ 0 resp. if k > 0 
and Ri 7 ^ 0 , then \x — yi\ = 0 . 

If we set q := + G^{x- y^)) , s* := 1 A■ ^ 0 } 

we set q' := J^iei + Gi{x - yt)), s' := 1 ^ 0 } 

K > 0, then the assumptions of Proposition |T2] are satisfied and therefore we 
obtain the first two desired results. Since F is locally Lipschitz continuous, 
dF(x) is in particular bounded due to Proposition [5] and consequently we 
obtain RdF{x) = {0} in the case ic = 0. □ 


Proposition 16 Let Assumption [3] be satisfied. Then there exists k £ R>o 
such that holds for {x, k), i.e. if the sequence of iteration points and (single) 
Lagrange multipliers is bounded and the sequence of iteration points has an 
accumulation point with cr(x) = 0, then this accumulation point is stationary 
for the optimization problem m- 
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Proof. Due to Ci), the continuity of F and (IlHhII . we obtain F{x) < 0. Due to 
Lemma O the complementarity condition RF{x) = 0 holds. Using (llldll and 
Lemmaini we calculate 0 £ df{x) + hdF{x). □ 

Proposition 17 Let (IM)l he satisfied. If there exist x £ and K C {1,2,...} 
with X x, then 

tlvk ^ 0 . (116) 

Proof. Luksan & Vlcek (isl . Proof of Lemma 3.5(ii)]. □ 

Proposition 18 Let (I95II be satisfied, let the sequence of (symmetric, positive 
definite matrices) {Hk\ he bounded and assume that there exists an infinite 
subset K C {1,2,...} and x £ with 

Xk X . (117) 

Then we have for all i > 0 

Xk+i X . (118) 

If additionally a(x) > 0 holds, then we have for all i > 0 

and for fixed Eq > 0 and for all fixed r > 0 there exists k > 0 such that 

Wk+^ > ^ , t’l+^ < eo ( 120 ) 

for all k > k, k £ K and 0 < i < r. 

Proof. We show (11181) by induction: The base case holds for i = 0 due to 
assumption (11171) . Now, let the induction hypothesis be satisfied for i > 0. We 
have 

dk+, = ( 121 ) 

due to dSni), (HU), (HHl) and (jiHI) as well as 

i|7L,+,(5p"+* + s'=+*+i|^-+*)P < 

dl+,W^+^dk+^^dl+,{ 

jeJk+i 

( 122 ) 

due to (IB51) and the positive definiteness of Wp~^'^ as well as 

^ .k+^+l^k+^ ^ -k+^+l ^ > 0 (123) 


due to (HU), (02), (02) and (172- Now, using (02, (I121D . (I122p . adding (I123p . 
using (02, the boundedness of {Hk} (by assumption), t^'‘ £ [0,1] and (11161) 
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yields |a;fc+(i+i) — Xk+i\ 0, and therefore — a;| 0 follows from 

the induction hypothesis. 

We show (firm by contradiction: Suppose (HHl) is false, i.e. there exists 
i > 0,t > 0, K C K: > t for all k € K. Since 0 < iwk+i < —t^^Vk+i 0 

due to (EH), SHI), dSHl), e [o,i], (gZ]) and (fTTCll . we have Wk+i 0 and 
therefore we obtain a{x) = 0 due to (IlHHIl and (I118L which is a contradiction 
to the assumption (j{x) > 0. 

We show (inni): Let r > 0 be fixed and 0 < i < r. Since we have < 
limif Wk+i due to the assumption cj{x) > 0, (|l()3li and (I118I1 . because of (HID 
and because Eq > 0 is a fixed number by assumption, there exist ki > 0 
with < Wk+i and < Eq for all k > ki with k £ K. Now, setting 
k := maxj/ci : 0 < i < r} yields (11201) . □ 

Proposition 19 Let p,g,A G and c, u, rc, /3 G K, m G (0,1), a > 0 with 

w = i|pp + a, V =+ a) , -P-g^p>mv, c = max (|g|, |p|, Va) 

and define Q : M —M by 

Q{v) ■= \\yg + (1 - v){p + A)\^ + + (1 - v)a , 


then 

min^ Q{u) <w- w'^ + 4c|Z\| + ^\A\^ . 

Proof. Luksan & Vlcek (H, Lemma 3.4]. 


□ 


We introduce the following notation (cf. Magnus &: Neudecker 
p. 31, Section 2 resp. p. 34, Section 4]). 


48, 


Definition 5 Let A,B G We define the Frobenius norm of A by 

I^If ■= '^0 define the vectorization A(.) of A as well as 

the Kronecker product A® B oi A and B by 


A 


0 ) 



A0B 


/ AiiB ... AinB 
\ AniB ... AnnB 


€ R 


N^xN^ 


(124) 


Proposition 20 Let A,B,C gR^^^ . Then 

1^1 < |4lU < ^|4l| , (ABC)(,) = (C^(g)^)B(,) , |^(g)A| < N\A\^ . (125) 

Proof. The first property of (I125II holds due to Golub & VAN Loan 
p. 56, Section 2.3.2], the second holds due to Magnus & Neudecker 
p. 35, Theorem 2], and the third holds due to ()124ll . □ 


1 ^, 

48, 


Now, we introduce differentiability of matrix valued functions (cf. Magnus 
& Neudecker [ii, p. 107, Definition 3]). 
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Definition 6 Let A : MP — > and fio G MP be fixed. If there exists 

B(^o) e with 

^(:)(/^o + m) =-4(.)(/ro) + B(/ro)/r + i?(.)(/xo,/x) (126) 

for all /i G in a neighborhood of /ip and lim^^o = 0, then A is said 

to be differentiable at hq. Furthermore, the N x iV-matrix d^(^ 0 il*) defined 
by 

d^(:) ifJ-o, fJ-) ■= B{no)tJ- G (127) 

is called the (first) differential of A at /tq with increment jj, and B{^q) is called 
the first derivative of A aX 

Proposition 21 Let T •= {Y : Y G R^^^, detP ^ 0} be the set of non¬ 
singular N X N-matrices. If A —> T is k times (continuously) differen¬ 

tiable, then so is B : R^ —> T defined by B{^) := A(yL)~^ and 

dB(/ro, /r) = -B{^Io)dA{^J.Q, n)B{^io) . (128) 

Proof Magnus & Neudecker (H, p. 156, Theorem 3]. □ 

Proposition 22 Let f : 12 CM. —> R** (with an open interval 12) be contin¬ 
uously differentiable and let lo := sup^gj^ \f'i^)\ < oo, then 

\f{y)- f{x)\<u}\y-x\ (129) 

for all x,y € f2 (i.e. f is Lipschitz continuous on f2). 

Proof. This is a direct consequence of the mean value theorem for vector valued 
functions (cf., e.g., Heuser p. 278, 167.4 Mittelwertsatz fiir vektorwer- 
tige Funktionen]). □ 

Proposition 23 Let be bounded and let {{Wp)~i} be bounded and 

uniformly positive definite. For k >\ we define Zk : R>o —> M^^^ 

Zk{s) := {W^ + + sG'=)“^ . (130) 

Then we have for all k > 1 

\Zk{k’^+^) - Zk{k^+^)\ < C 5 \k^+^ - , 0<G5<oo, (131) 

where G 5 := G 2 G 4 , G 4 := NCqCs, C 3 := N^Cq o.nd C 2 is a positive constant. 

Proof. We define for all fc > 1 

Yk{s) := {W^ + G'^ + sG^y^ (132) 

and therefore we have |yfe(k;^+^)“^| < Gi for all fc > 1 due to (113211 and (ll()2L 
which is equivalent to {Yk{R^'‘^^)} being uniformly positive definite due to 
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(unnD, i.e. there exists (72 > 0 with Xniin{Yk{K^~^^)) > (72- Consequently, we 
obtain for all fc > 1 

- _ i 

_ 1 _ 2 

(A„i„(Ffc(R''+2)))5+(A„i„(yfc(B'=+i)))^ - 2 2 

and hence we estimate for all fe > 1 

\YkiK^+^)"- - Yk{R^+^)i\ < C2\Yk{K>^+^) - Yk{R^+^)\ , (133) 

due to (1^ . where we set C 2 ■= ^C 2 ^ > 0. 

Defining 

Xfe(s) := + G'^ + Uk{s) , Ukis) := sd'^ , Ut := (134) 

for fc > 1, we calculate — s) = Uk{t — s) due to (11341) . Therefore, we 

have Xk{t) = Xk{s) + Uk(t — s) for all fc > 1 and for all s, t G R due to (11341) . 
which is equivalent to X/^j^.^t) = Xj, (.)(s) + Uk{t — s). Consequently, (11261) 
and (112711 imply that the differential of Xk at s is given by 

dXk^(.){s,t - s) = Ukit - s) (135) 

(with Rk{s,t — s) = 0) and that the derivative of Xk at s is constant, which 
implies that Xk is continuously differentiable. Furthermore, we estimate for 
all /c > 1 

\Uk\<C3, (136) 

due to (I134L (I125|) and the initialization of Algorithm [31 

Since (7^ is symmetric and positive definite, we obtain that Uk{s) is sym¬ 
metric and positive semidefinite for all s > 0 (cf. (11341) '). Consequently, we 
have Wp ^ Xk{s) due to the symmetry and the positive definiteness of (7^, 
(IM)) and (I134|) . Therefore, we estimate for all fc > 1 and for all s > 0 

|lfc(s)| < Co . (137) 

due to (I132L (I134|) . (1351) and (IIOID . 

For fe > 1 Yfc we define 

Vkis)-.= {Ykis)®Yk{s))Uk ■ (138) 

Since Xk is continuously differentiable (cf. (I135|) '). Proposition |3T] yields the 
continuous differentiability of Yfc (s) = Xfe(s)“^ due to (11321) and (113411 . as well 
as dyfe,(:)(s,t - s) = Vk{s){t - s) due to (11281) . (|125l) . Yfc(s) G (11351) 

and (11381) and therefore (11271) implies that Vk{s) is the derivative of Yk at s. 
Furthermore, we estimate for all fe > 1 

sup |Vfc(s)| < (74 . (139) 

s>0 


due to (I138L (I125|l . (113611 and (11371) . 
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Since Yfc is continuously differentiable for all s, t £ S' := {^ G R : ^ > 0} 
(note that S is an interval) and since the derivative of Yk at s is given by 14 (s) 
(cf. (11381) 1 and since the norm of the derivative |14(s)| is bounded on S due 
to (11391) . we obtain 

|n.(:)(i)-14.(:)(s)| <C4 |t-s| (140) 

for all s,t G S and for all fc > 1 due to (11291) . 

Now, we estimate for all fc > 1 

|Zfe(s'=+2) - Zfc(/c'=+i)| < C5 |k'=+" - 

due to (I130L (I132|) . (11331) . (11251) and (11401) . 

Furthermore, we obtain Cs = C 2 NiC'^CG and therefore the fact that 
C 2 is a positive constant due to (11331) . the fact that V > 1 is a fixed finite 
natural number, combining (nni]) with the positive definiteness of IFp, and 
the initialization of Algorithm [3] yield (|131D . □ 

From now on let the following assumption be satisfied. 

Assumption 7. Let (IMl) he satisfied. Furthermore, let the sequence {{xk, k^“'"^)} 
be bounded, let the sequence (of symmetric, positive definite matrices) {{Wp)~^} 
be bounded as well as uniformly positive definite and let x £ K” be any accu¬ 
mulation point of {xk}, i.e. there exists (an infinite set) K C {1,2,...} with 


and demand 

K _ 

Xk —> X , 

(141) 

as well as 

-k+2 _ -k+1 Q 

(142) 

(cf. Remark^. 

:= inf tg > 0 
fe>0 

(143) 


Next, we present Lemma 171 1181 which we need for proving Theorem |51 


Lemma 7 (Bounded basic sequences) The following boundedness state¬ 
ments hold: 

{Vk}, {pkGk}, {pkGk} and {pk} are bounded, (144) 

{Hk} is bounded, (145) 

{ffLi {HkPk} and are bounded. (146) 

Proof. (I144|) holds as this statement was shown in the proof of LemmalU where 
only the assumption of the boundedness of {xk} was used, and consequently, 
this statement is here also true. Since {{Wp)~^} is bounded due to assump¬ 
tion, (I145|) holds due to Corollary O Due to (133)) . the boundedness of {xk} 
and (|144l) resp. {HkPkl < \Hk\ ■ \gk\ and (I145p resp. ([33]), ([Ml), dMl), (ED), the 
Cauchy-Schwarz inequality and the fact that / is continuous on (the whole) 
R", we obtain dna. □ 
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Lemma 8 (Bounded aggregate sequences) We define 

Tk := ctp + ( - F(xk)) > 0 , (147) 

then 


{wk}, {(jp}, {gp}, {dp}, {Hkigp+n'^+^g^)} and {rfc} are bounded . 

(148) 


Proof. Since {X, \p, g, fj,p) £ R^d^tl+i) -^ith 


r 1 for ? — /c "1 

I 0 for j € Jk\ {k} J ’ ■= 0 ’ '■= 0 j & Jk , gp ■■= 0 

(149) 

is feasible for the (dual) problem (I^Tl) for fc > 1 (Note: This problem is written 
as a minimization problem), we obtain (Note: Wk is the optimal function value 
of (pril l due to ([HU), (021), (IHOl) . (|45|1 . (IT^ and inserting the feasible point from 
(I149|l that Wk < + cik- Hence, due to ()rS)l and (IH^ . we estimate 

0 < ^\Hk{gp + + dp + R'^'^^Ap + ( — F{xk)) < ^\Hkgk\'^ + 

and therefore (11461) as well as the non-negativity of dp, Ap resp. —F{xk) 
due (021), (021), (011) resp. ((711) imply that {wk}, {dp}, {R^~^^Ap}, {Hk{gp + 
K^+^^p)} and {rfc} are bounded. Now, consider the proof of Lemma jS) There 

we only used the first consequence Xk x of (11061) (and this property is 
also satisfied here due to (I141II 1 of the assumption a{x) = 0 for showing the 
convergence of resp. on a subsequence. Consequently, g^ resp. g^ are 

also bounded here. [The second property [wk ^ 0) of (110611 resulting from 
(j{x) = 0 there, is first used directly after proving the boundedness of g^ and 
gp. If this property was already used for showing these boundedness results, 
the above implication would be false, since then indeed cr(x) = 0 (and not only 
Xk a:) would be used for proving the boundedness of g^ and g^, and the 
relevant situation in the proof of Theorem |8l will be cr(x) > 0.] □ 


Lemma 9 (a is finite) a(x) is finite. 


Proof. This is true due to (I103L the assumption of the boundedness of {a;fc} 

and (fna . □ 


Lemma 10 (Cauchy sequences) We have 


K 


^ 0 


f{xk+i)-f{xk)-^0, F{xk+i) - F{xk)0 


P+i - ^ 0 


pk+l _ pk K Q 

X p X p 7 \J 




(150) 

(151) 

(152) 


(153) 


where 


Ak := Hk+i{{gl+^ + - {fp + ■ 
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Proof. Since the assumptions of Proposition [TS] for applying (111811 are satisfied 
— Xk X holds due to (11411) . a{x) > 0 holds due to LemmaIH]— applying 

(I118D for i = 1 and i = 0 yields Xk+i — Xk 0. Due to (157)1 and (I55)) . 
we obtain (fTHHl) . Because of (HHD and the continuity of / and we obtain 
(I151|) . Due to (1551) the assumptions of Proposition [TT] are satisfied and therefore 
we estimate using (l 88 ll . (114411 and (l5^ resp. (l89l) . (l90ll . (1144)1 and ()53ll that 
|Gp+^| < Cg , |Gp+^| < Cg- Due to (1501) . the Cauchy-Schwarz inequality 

and (11481) resp. (1571) . the Cauchy-Schwarz inequality and (11481) resp. (1153)) . 
(EH), (ED, (ill), ESI) and the boundedness of (by assumption), we 

obtain (1152)1 . □ 

Lemma 11 (Zero sequence) We have 

\{al+^ - al) + - Al) + R^+\F{xk) - F{xk+i)) | -^ 0 . 

( ol^ \ 

■ 7 ^]^ due to (H51l and (HU) and because of the 

boundedness of {5^} due to (11481) . is bounded. Since the function ^ 1 —>■ 
with wi > 1 is Lipschitz continuous on every bounded subset of 1R+, there 
exists Cl > 0 with 

. (154) 

In the case = 0, we have = 0 due to (1421) and (1451) . Now consider 

the case > 0. Because of 0 < (H51) 

and ()^51l and because of the boundedness of due to assumption and 

the boundedness of {R^^^A^} due to (114811 . R^^^s^ is bounded. Therefore, 
{R^~^^Sp} is bounded for all > 0. Since the function f 1 —>■ with W 2 > 1 

is Lipschitz continuous on every bounded subset of R+, there exists cl > 0 
with I(K^'+^Sp +^)“2 — (K;^+^Sp)“^| < crR^^^lSp'^^ — Sp\ and hence, using the 
assumption of the boundedness of and W 2 > 1 as well as setting cl := 

clSupl>i < c», we obtain 

_L.+l|(.fc+l)., _ ^ _ |fc| ^ ^^55) 

We remind of the formula |max(a, 6) — max(c, (i)| < |a — c| -I- |6 — d| for all 

a,b,c,d G M. Therefore, we have — 5^1 0 due to (1551) . (Hill . (1154)) . 

(1152)1 . (1151)1 and ()150)) . Furthermore, due to (1571) and (1551) . we obtain 

1^^' - -F^\ + \F{xk) - F{xk+i)\ + ■ 

Multiplying this last inequality with > 0 (due to (jH])) and using (1155)1 . 
the boundedness of ()152l) . ()151l) and (11501) yields 0 

and R^^^\F{xk) — F{xk+i)\ 0. Therefore, using (1551) . we obtain the desired 
result. □ 
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Lemma 12 (Estimates for zero sequences) Assume <t{x) > 0. Then the 
constants 

c := sup , 6:=^, c := , 

c-=s^p{\gtXl\ + \ 9 p+K''+^gp\) , Ce := cCs max (2c, 1, icCs) . 

A;>1 

(156) 


are finite and there exists k > 0 such that 

ic^ > 4c|Z\fe| + + |(a^+i - a^) + r'^+^{A^+^ - i^)+ 

R'^+\F{xk) - F{xk+i))\ 

ic^ > Ce{\R^^‘^ - + \Ak\ ■ - k'^+^P) 

hold for all k > k. 

Proof. Then c is finite due to (I156L (1 14511 . (114611 and (114811 . Furthermore, we 
have c > 0 (If we had c = 0, then using (I156L (I147|l and (H5)l would imply 
u>fc = 0 for all fc > 1, which is a contradiction to assumption (I^Sll i. Due to 
(UnSD, <y{ x) > 0 and 1 — mu > 0 (cf. the initialization of Algorithm [3]), we 
have c = i where (t(x) > 0 implies c > 0, and Lemma [9] implies 

c < oo. Due to (I156p . (I146I1 . (I148p and the assumption of the boundedness of 
c > 0 is bounded. Therefore, (115611 and (113111 imply 0 < Cq < oo. Since 
4c|Afe| + ^ + |(afc+i-dfc)+«'=+i(A^+i-ifc) + s'=+i(F(xfc)-F(xfe+i))| A 0 
due to (115211 and Lemma [11] and since Ce (| ^ | + | Aj, | • | ^ | + 

l^fc+2 _ ^fc+l|2^| Q |.Q dnai, there exists fc > 0 such that (|I57|| holds 
for all k > k. □ 


Lemma 13 (Estimate with error term) We define for k > 1 

qk-.= H,gfXl, p,:=H,ig^p + R'^+^~g^p) 

Cfc := (2c + |Afc|)g|Efe| + ic^lEfep , := H^+i - Hu . 

Then we have for all v G [0,1] and for all k > 1 

^WHu+igXXiFi^~^)Hk+i{gp~^^FR'^^^gp~^^)\^ < ^\vqk+{^—v)(j)k+Au)\'^+ek ■ 

(158) 

Proof Setting zu '■= Eug^XX obtain Hu+ig^Xi = Qk + zu due to (I157p . 
Setting Zk := Ek{g'; + we obtain Hk+i{g^+^ + R’^+^g^+^) = Pu + 

Ak + Zk due to (115311 and (I157F Furthermore, we estimate for all v G [0,1] 

{izqu + (1 - i^){pk + Ak)Y{vzk + (1 - v)zk) < (2c + \Ak\)\vzk + (1 - v)zk\ 

due to the Cauchy-Schwarz inequality, (115711 and (115611 as well as \vzk + (1 — 
k')zk\ < c\Ek\ due to (I156F Hence we obtain (115811 due to and (115711 . □ 
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Lemma 14 (Index construction) Assume <j{x) > 0 and define 

r:=|-§?+*m, r:=ii+f. (159) 

Then there exists a finite index kg € K such that 

Wk>S, tl < to 
in ^ ii ini 

hold for k ■= + ii + i with i € [im, r] fl {0,1 ,... }. 

Proof. We obtain r > ii + im > ii > 0 due to (I159II and the initialization of 
Algorithm [21 Therefore, \ii,r\ is a well-defined interval and since > 0 is a 
natural number (cf. Algorithm [3]) , there exists i G [b, r] fl {0,1,... } C [0, r]. 
Furthermore, [im^f] is a well-defined interval and since > 0 is a natural 
number (cf. Algorithm jS]), there exists i G [im^f\ fl {0,1,...} C [0, r]. The 
assumptions of Proposition [T2] are satisfied — (1^51) holds due to assumption, 

{Hk\ is bounded due (I145L we have Xk x due to (I141L we have <j{x) > 0 
due to assumption and Lemma |21 r > 0 is a fixed number due to (115911 . the 
choice Eg '■= > 0 yields a fixed positive number eg due to (|143ll — and 

therefore we can apply Proposition [181 For r defined in (I159II there exists k > 0 
with 

z^k+i >^=S, tl+^<eo = (162) 

for all fc > fc, fc G AT and for all 0 < i < r due to (11201) and (11561) . Since K 
is an infinite set due to (|141|) {K C {1,2,...,}), we can choose kg G K with 
fcg > max(fc,fc) > k (k was introduced in Lemma IT^ . Hence, (116211 holds in 
particular for all k > ko and hence for k = fcg, i.e. Wko+i ^ ^ ^^nd < ^g”^ 
for all 0 < * < r. Because of t^~^^ < tQ°^^ for all 0 < * < r due to (114311 . we 
obtain 

Wk„+i > S , < 4°+^ (163) 

for all 0 < f < r. Due to (11591) . (11631) holds in particular for all i G [b,r] = 
ii + [0, r] which yields Wko+ii+i > ^ and ^ ^feo+*i+* j g [0, f]. In 

particular, these last two inequalities hold for all i G [im-, r] IH {0,1,...,} and 
now setting k := ko+ii+i yields the desired index and that (11601) holds after 
step 6 (line search) of Algorithm [S] Due to (116311 . we have in 

particular for all 0 < * < *; -I- Zm- Consequently, the case (15^ always occurs 
for the ii + im + 1 subsequent iterations feg -I- 0, ..., fcg -|- zp ..., fcg -|- b -|- 
(Remember: Zn > 0 denotes the number of subsequent short and null steps 
according to the initialization of Algorithm [3|) and therefore (dm) holds at 
the end of iteration fcg -|- z; -|- im (even if the initial value of z„ is zero at the 
beginning of iteration fcg -|- 0) after step 6 (line search) of Algorithm [21 □ 

Lemma 15 (Error estimate) Fork defined in Lemma^^we have Ck < \<? ■ 


( 160 ) 

( 161 ) 
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Proof. Since in > *; + *m due to (I161II and since increases at most by one 
at each iteration due to (I5il) . we have at iteration k at least in > */ + im 
and hence either the case (I34II or ((35ll occurs (at iteration k). Furthermore, 
since in > H + im due to (I161L the cases (15^ and (1551) occur at iteration 
k + 1. Therefore, combining these facts yields Ek = due 

to HSZI), dUl), (HU), the fact that + X'f = I = EjeJ^+i 

and (11301) . Since is bounded and is bounded as well as 

uniformly positive definite (by assumption), we can make use of Proposition 
[23] and hence we obtain |£ifc| < due to (I131II . Consequently, 

we obtain he desired estimate due to (IT^ . dUi) and (HHj). □ 

Lemma 16 (Termination criterion estimate) For k defined in Lemma 
M a short or null step which changes the model of the objective function is 
executed and 

Wk+i < - dp) + k'“'+^(^p+^ - Ap) + K’^+^{F{xk) - F(a;fc+i))| + Cfc 

+ min X\,^qk + (1 - v){j>k + Ak)\'^ + va^k^l + (1 - v)Tk ■ 

Proof. Combining (11601) with step 6 (line search) of Algorithm |3| and consider¬ 
ing the case in > b + *m > ii due to (fTHTI) in the line search (Algorithm Hj) , we 
obtain that at iteration k a short or null step which changes the model of the 
objective function is executed. Furthermore, ig is unchanged (since no serious 
step is executed), i.e. ig < ir (no bundle reset) still holds (If ig > E, then we 
would have had a serious step at iteration k, as a bundle reset can only occur 
after a serious step). Therefore, {X, Xp, fi, fip) € Jfc+i|-l-i) -^^ith 

._ f ly for j = k + 1 1 

( 0 for j e Jfc+i \ {/c -f 1} J ’ 

Xp := 1 — ly , pLj := 0 for all j € Jk+i , Fv ~ , (164) 

where u € [0,1], is feasible for the (fc -I- l)st (dual) problem (I^TI) (Note: This 
problem is written as a minimization problem) and, hence, due to (I82|) . (I81|) . 
(l43)) . (1^ . (IMl) . (l42l) . inserting the feasible point from (11641) . (11471) . taking 
into account that v e [0,1] and (11581) . we estimate (Note: Wk+i iii ItSTl) is the 
optimal function value of (HD)) 

Wk+i < \\vqk + (1 - v){pk F 4\fc)P FckF va'lXl + (1 - v)Tk 

+ |(a^+i - al) + k"+i(A^+i - AD + k'=+i(F(xfe) - A(a;,+i))| 

and consequently, since v G [0,1] is arbitrary, we obtain the desired estimate. 

□ 

Lemma 17 (Termination criterion is shrinking) For k defined in Lemma 
M we have Wk+i < Wk — c^. 
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Proof. Since for p := pk, g := Qk, ^ := v := Vk - + 

XpG^ + PjG’j + PpG^)dk, w := Wk, (3 := oil'll, m := ttir and a := Tk, the 
assumptions of Proposition [T^ are satisfied and since we have = 2c^ 

due to (11561) . now applying Proposition [TOl yields the desired estimate due to 
Lemma [Tl)l (I157|) . Lemma fT^ and (11601) . □ 

Lemma 18 (Contradiction) For kg from Lemma IT^ vie have Wko+n+i < 0. 

Proof. Set n := uiaXz<r,zG{o,i,...} ^ (Note that f > 0 due to (11591) 1. then we 
have n+1 > r and hence (115911 implies — c^(n + l —< — |c^. Now, applying 
Lemma [I2l( n — im) + 3 times as well as using (j4^ . (I147|) and (11561) yields 

Wko+n+l < Wko+I^ - (n + 1 - < Wfeo+im - = 0 • n 

Theorem 8 Let Assumption\^ be satisfied. Then there exists R € M>o such 
that holds for (x,k), i.e. each accumulation point of the sequence of itera¬ 
tion points {xfe} is stationary for the optimization problem (P)) . 

Proof, (by contradiction) Since {(xfc, k^+^)} is bounded and {{Wp)~i} is uni¬ 
formly positive definite (both due to assumption) , the statement follows from 
Proposition [TBl if we can show o-{x) = 0. We suppose this is false, i.e. we have 
due to (|103|) (j(x) > 0 or a{x) = oo. Due to Assumption [71 we can make use 
of Lemma |7H3 which implies that only the case cr(x) > 0 occurs. Therefore, 
we can use Lemma HUHH which yields a contradiction to the non-negativity 
of Wk for all A: > 1 due to (l82l) . □ 

Remark 10 In examples that do not satisfy the nonsmooth constraint qualih- 
cation m, ^fc+i became very large in Algorithm [3] (Note that Theorem [5] has 
in particular the assumption that is bounded). 

The assumption (11421) o f Theorem [5] was satisfied in all numerical examples 
in Fendl & SCHICHL [2 in which the termination criterion of Algorithmic] 
was satisfied. 

If Iq is only modified in, e.g., hnitely many iterations of Algorithmic! then 
(11431) is satisfied (cf. Remark [C|). 

For an unconstrained optimization problem we obtain in the proof of 
Lemma [Td that Ek = 0 which implies that Cfc = 0 due to (11571) . Therefore, 
Lemma no is trivially satisfied in the unconstrained case, since c from [150 is 
positive. 

If we demand that all assumptions in the proof of convergence, which we 
imposed on Wp, are satisfied for J2jeJk ^jGj + XpG^, then the convergence 
result also holds in the case Wp = 0. This is important, since first numerical 
results in the unconstrained case showed a better performance for the choice 
Wp = 0, which is due to the fact that otherwise for a smooth, convex objective 
function / the Hessian information in the QCQP (ITHl) is distorted — this can 
be seen by putting the constraints of the QCQP (17(11) into its objective function. 
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which is then given by 


max ( - + (f g’; + U^G'^d) + U'^W^d 

J&Jk ^ J ^ J / Z P 

= max ( — aj 


+ d^g’; + ^d^{G'; + Wl;)d) . 


5 Conclusion 


In this paper we investigated the possibility of extending the SQP-approach of 
the bundle-Newton method for nonsmooth unconstrained minimization byLuKSAN 
& Vlcek [4^ to nonsmooth nonlinearly constrained optimization problems, 
where we did not use a penalty function or a filter or an improvement func¬ 
tion to handle the constraints. Instead — after the commitment to only accept 
strictly feasible points as iteration points, while trial points do not need to have 
this property — we computed the search direction by solving a convex QCQP 
in the hope to obtain preferably feasible points that yield a good descent. 
Since the duality gap for such problems is zero, if the iteration point is strictly 
feasible, we were able to establish a global convergence result under certain 
assumptions. Furthermore, we discussed the presence of in the line search, 
we explained why this should not be a problem when we use the solution of 
the QCQP as the search direction and we referred to Fendl & SCHiCHL (l3 | 
that this turns out to be true in practice for at least many examples of the 
Hock-Schittkowski collection by Schittkowski (63. 641. 
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